• Emerging PT
  • Posts
  • What Is a P-Value, What Does It Tell Me, and How Do I Tie It to Clinical Practice?

What Is a P-Value, What Does It Tell Me, and How Do I Tie It to Clinical Practice?

Laying the Groundwork for a Smarter, Stronger, Clinical Approach

Why Evidence-Based Practice Matters

Welcome to a new ongoing series in Emerging PT focused on Evidence-Based Practice (EBP). In this series, we aim to strengthen your clinical reasoning by breaking down key concepts in research interpretation. With new research emerging approximately every 7 years to replace or challenge current clinical practices, understanding the science behind the evidence is essential for modern physical therapists.

This edition kicks things off with the humble yet powerful p-value. While many practitioners glance at the p-value in an abstract and move on, we’ll unpack what it really means, when to trust it, and how to use it responsibly in clinical decision-making. Especially in the aftermath of debates seen during the COVID-19 pandemic, where clinicians often relied on gut feeling or poorly understood data, it's more important than ever to ground care in solid, interpretable evidence.

Understanding Hypothesis Testing

Hypothesis testing is the framework researchers use to determine whether an intervention has a statistically significant effect. In simple terms, it tests whether the observed effect is likely due to the intervention or merely due to chance.

  • What is a p-value? A p-value represents the strength of evidence in support of the null hypothesis, which states that there is no real effect or difference. A small p-value suggests that the observed data is unlikely under the null hypothesis, and thus supports the alternative hypothesis.

  • Relation to Alpha Level (α)
    The alpha level is the threshold researchers set to determine significance. A common α is 0.05. If the p-value is less than α, the result is considered statistically significant. Essentially, α reflects how willing researchers are to risk a Type I error (false positive).

The Risk of Error

  • Type I Error (α): Concluding there is a difference when there is none (false positive).

  • Type II Error (β): Concluding there is no difference when there is one (false negative).

Examples:

  • Type I: A study finds a new balance intervention improves function, but in reality, it doesn’t. The result was due to chance. This means that traditional care is more effective but is now being rejected for an intervention that is ineffective.

  • Type II: A beneficial strengthening protocol is overlooked because the study didn't find significance due to a small sample size.

An important note is that Type I errors are viewed as more serious because they can lead to adoption of ineffective or even harmful practices2. 

Here is an easy graphical depiction of error.

Reality

Statistical Conclusion: Difference

Statistical Conclusion: No Difference

True Difference

Correct Decision

Type II Error

No True Difference

Type I Error

Correct Decision

Factors Affecting P-Value

  1. Magnitude of Difference between Means: The p-value helps us understand how likely it is to see the results we observed if there were actually no real effect. One major factor influencing this is the magnitude of difference between group means. Larger observed differences make it less likely that results are due to random chance alone, resulting in a smaller p-value. For example, if a new gait intervention improves walking speed by 0.2 m/s compared to standard care, this is more likely to reach statistical significance than a 0.05 m/s improvement. However, it’s important to recognize that statistical significance does not always mean clinical significance. A small difference can still yield a small p-value in large samples, so clinicians must evaluate whether the size of the effect is meaningful in practice3  .

  2. Variability is Distributions: Variability refers to the spread or dispersion of data points within each group. Low variability means data points are tightly clustered around the mean, making it easier to detect real differences between groups.High variability, on the other hand, introduces noise that can obscure meaningful effects. This leads to larger standard errors and wider confidence intervals, which in turn inflate the p-value. Sources of variability may include measurement error, diverse participant characteristics, or inconsistent intervention delivery. Clinicians should pay attention to reported standard deviations and confidence intervals in studies, as these help contextualize how precise and reliable the findings are⁴.

  3. Size of Sample: Sample size directly impacts the precision of statistical estimates. Larger sample sizes provide more stable averages, reduce the standard error, and increase the likelihood of detecting true differences.This means that studies with large samples are more likely to yield small p-values when a real effect exists. Conversely, small studies may produce unreliable results due to greater susceptibility to random fluctuations in the data.A small sample may suggest an effect is significant even when it is not, or fail to detect a real effect, simply because it lacks the data to confidently differentiate signal from noise. Because of this, clinicians should be cautious when interpreting findings from small studies and consider whether the sample is large and diverse enough to generalize to their patient population⁵.

    Statistical Power
    Statistical power is the probability that a test will correctly reject a false null hypothesis (i.e., detect a true effect). Power increases with larger sample sizes and lower variability. A study with low power is more likely to miss real differences (Type II error), which could lead us to overlook valuable interventions. Clinically, this underscores the importance of reading studies with adequately powered samples, especially when making decisions for populations with diverse needs.

    Degrees of Freedom (DoF)
    Degrees of freedom represent the number of independent values in a dataset that are free to vary. It is a function of sample size and influences the shape of the sampling distribution used in hypothesis testing. More degrees of freedom, typically resulting from larger sample sizes, improve the reliability of the statistical estimates and increase the credibility of the p-value. In short, higherDoF means we can be more confident that the sample truly reflects the larger population we’re trying to understand.

Utilizing the P-Value But Not Relying On It Alone

The p-value is a useful tool, but it should not be the only metric we consider when evaluating research. Misinterpretation or manipulation of p-values can lead to poor clinical decisions. The strength of our clinical practice depends on critical appraisal.

What else should we look at?

  • Study methods

  • Type of research design and references used

  • Specificity and validity of the tools

  • Clinical significance vs. statistical significance

These topics will be explored in future editions as we continue building a strong foundation in EBP.

References:

  1. Kolata G. The covid drug wars that pitted doctor vs doctor. The New York Times. April 12, 2022. https://www.nytimes.com/2022/04/12/health/covid-drugs-doctors.html

  2. Motulsky H. Intuitive Biostatistics. 3rd ed. Oxford University Press; 2014.

  3. Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2(8):e124. 

  4. Sullivan GM, Feinn R. Using effect size—or why the P value is not enough. J Grad Med Educ. 2012;4(3):279-282. 

  5. Altman DG, Bland JM. Statistics notes: standard deviations and standard errors.BMJ. 1995;331(7521):903. 

  6. Button KS, Ioannidis JPA, Mokrysz C, et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013;14(5):365-376.

Disclaimer:

We are current Doctor of Physical Therapy (DPT) students sharing information based on my formal education and independent studies. The content presented in this newsletter is intended for informational and educational purposes only and should not be considered professional medical advice. While we strive to provide accurate and up-to-date information, our knowledge is based on our current academic and clinical rotations and ongoing learning, not extensive clinical practice.

Reply

or to participate.