No Estimation without Inference A Response to the International Society of Physiotherapy Journal Editors
Main Article Content
The International Society of Physiotherapy Journal Editors (ISPJE) recently published an editorial warning that many of their journals would soon prohibit the use of null hypothesis tests and instead require authors to interpret 95% confidence intervals relative to clinically important values. Although I encourage the reporting of confidence intervals and the discussing of uncertainty in the context of a research question, the ISPJE’s proposed ban is illogical and there are several instances of flawed statistical reasoning in the editorial. In brief, the editorial: (1) fails to adequately grapple with the inherent connection between hypothesis testing and estimation, (2) presents several misleading arguments about the perceived flaws of hypothesis tests, and (3) presents an alternative to hypothesis testing that is, in itself, a form of hypothesis test – the minimal effects test – albeit done informally. If the editorials’ arguments are taken at face value, then that will lower the statistical literacy in our field and readers will have a flawed understanding of p-values. Further, if the editorials’ proposed ban is put into practice, I fear that could decrease the scientific integrity of our research as it removes quantitative benchmarks in favor of a more subjective interpretation of confidence intervals. Ultimately, I think that many of the ISPJE’s concerns that led to the editorial are valid, but I think those problems are the result of questionable research practices stemming from poor methodological training for authors, reviewers, and editors. These problems will only be fixed through better and continuing education, not the banning of statistically valid methods.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided that the original work is properly cited.
Elkins, M. R. et al. Statistical inference through estimation: recommendations from the International Society of Physiotherapy Journal Editors. Phys. Ther. 102, pzac066 (2022).
Murphy, K. R. & Myors, B. Testing the hypothesis that treatments have negligible effects: Minimum-effect tests in the general linear model. J. Appl. Psychol. 84, 234–248 (1999).
Rafi, Z. & Greenland, S. Semantic and cognitive tools to aid statistical science: replace confidence and significance by compatibility and surprise. BMC Med. Res. Methodol. 20, 244 (2020).
Cohen, J. The earth is round (p < .05). Am. Psychol. 49, 997–1003 (1994).
Lakens, D. The Practical Alternative to the p Value Is the Correctly Used p Value. Perspect. Psychol. Sci. 16, 639–648 (2021).
Herbert, R. Research Note: Significance testing and hypothesis testing: meaningless, misleading and mostly unnecessary. J. Physiother. 65, 178–181 (2019).
Lakens, D. Why P values are not measures of evidence. Trends Ecol. Evol. 37, 289–290 (2022).
Muff, S., Nilsen, E. B., O’Hara, R. B. & Nater, C. R. Response to ‘Why P values are not measures of evidence’ by D. Lakens. Trends Ecol. Evol. 37, 291–292 (2022).
Goodman, S. N. & Royall, R. Evidence and scientific research. Am. J. Public Health 78, 1568–1574 (1988).
Goodman, S. N. Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy. Ann. Intern. Med. 130, 995–1004 (1999).
Collaboration, O. S. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015).
Patil, P., Peng, R. D. & Leek, J. T. What Should Researchers Expect When They Replicate Studies? A Statistical View of Replicability in Psychological Science. Perspect. Psychol. Sci. 11, 539–544 (2016).
Scheel, A. M., Schijen, M. R. M. J. & Lakens, D. An Excess of Positive Results: Comparing the Standard Psychology Literature With Registered Reports. Adv. Methods Pract. Psychol. Sci. 4, 25152459211007468 (2021).
Ioannidis, J. P. Why most published research findings are false. PLoS Med. 2, e124 (2005).
Anderson, S. F. & Maxwell, S. E. Addressing the “Replication Crisis”: Using Original Studies to Design Replication Studies with Appropriate Statistical Power. Multivar. Behav. Res. 52, 305–324 (2017).
Nosek, B. A. et al. Replicability, robustness, and reproducibility in psychological science. Annu. Rev. Psychol. 73, 719–748 (2022).
Elkins, M. R. et al. Correspondence: Response to Lakens. J. Physiother. 68, 214 (2022).
Lakens, D. Correspondence: Reward, but do not yet require, interval hypothesis tests. J. Physiother. 68, 213–214 (2022).
Tenan, M. & Caldwell, A. A Critical Review of Phyiotherapy Editor’s Comments on Statistical Practice.
Elkins, M. R. et al. Statistical inference through estimation: recommendations from the International Society of Physiotherapy Journal Editors. J. Physiother. 68, 1–4 (2022).
Borg, D. N. et al. Sharing data and code: a comment on the call for the adoption of more transparent research practices in sport and exercise science. (2020).
Caldwell, A. & Vigotsky, A. D. A case against default effect sizes in sport and exercise science. PeerJ 8, e10314 (2020).
McGrath, R. E. & Meyer, G. J. When effect sizes disagree: the case of r and d. Psychol. Methods 11, 386 (2006).
Levine, T. R. & Hullett, C. R. Eta Squared, Partial Eta Squared, and Misreporting of Effect Size in Communication Research. Hum. Commun. Res. 28, 612–625 (2002).
Dabija, D. I. & Jain, N. B. Minimal Clinically Important Difference of Shoulder Outcome Measures and Diagnoses: A Systematic Review. Am. J. Phys. Med. Rehabil. 98, 671–676 (2019).
Fricker Jr, R. D., Burke, K., Han, X. & Woodall, W. H. Assessing the statistical analyses used in basic and applied social psychology after their p-value ban. Am. Stat. 73, 374–384 (2019).
Sainani, K. L. The Problem with" Magnitude-based Inference". Med. Sci. Sports Exerc. 50, 2166–2176 (2018).
Sainani, K. L., Lohse, K. R., Jones, P. R. & Vickers, A. Magnitude-based inference is not Bayesian and is not a valid method of inference. Scand. J. Med. Sci. Sports 29, 1428 (2019).
Lohse, K. R. et al. Systematic review of the use of “magnitude-based inference” in sports science and medicine. PloS One 15, e0235318 (2020).
Benjamin, D. J. et al. Redefine statistical significance. Nat. Hum. Behav. 2, 6–10 (2018).
Lakens, D. et al. Justify your alpha. Nat. Hum. Behav. 2, 168–171 (2018).
Amrhein, V. & Greenland, S. Remove, rather than redefine, statistical significance. Nat. Hum. Behav. 2, 4–4 (2018).
McShane, B. B., Gal, D., Gelman, A., Robert, C. & Tackett, J. L. Abandon statistical significance. Am. Stat. 73, 235–245 (2019).
Simmons, J. P., Nelson, L. D. & Simonsohn, U. Life after p-hacking. in Meeting of the society for personality and social psychology, New Orleans, LA 17–19 (2013).
Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. (2016).
Sun, X. et al. Credibility of claims of subgroup effects in randomised controlled trials: systematic review. Bmj 344, (2012).
Kerr, N. L. HARKing: Hypothesizing after the results are known. Personal. Soc. Psychol. Rev. 2, 196–217 (1998).
Rosenthal, R. The file drawer problem and tolerance for null results. Psychol. Bull. 86, (1979).
Borg, D. N., Lohse, K. R. & Sainani, K. L. Ten common statistical errors from all phases of research, and their fixes. PM&R 12, 610–614 (2020).
Leek, J. T. & Peng, R. D. Statistics: P values are just the tip of the iceberg. Nature 520, 612–612 (2015).
Wasserstein, R. L., Schirm, A. L. & Lazar, N. A. Moving to a world beyond “p< 0.05”. The American Statistician vol. 73 1–19 (2019).