Critical Evaluation of Data Requires Rigorous but Broadly Based Statistical InferenceResponse by Lem Moyé and Michelle Cohen
This article requires a subscription to view the full text. If you have a subscription you may use the login form below to view the article. Access to this article can also be purchased.
The rampant misuse of the P value and of its stated meaning lead the American Statistical Association to comment, researchers often wish to turn a P value into a statement about the truth of a null hypothesis, or about the probability that random chance produced the observed data. The P value is neither. It is a statement about data in relation to a specified hypothetical explanation and is not a statement about the explanation itself.1 The American Statistical Association is not alone in its concern: a host of recent literature provides context for the desire to use statistical inference to reinforce rigor and reproducibility in scientific research.2–4 A central focus of this literature is the widely acknowledged and severe limitations we impose on ourselves with a blind and naive adherence to the exclusive use of P values for understanding significance of research findings.
Counterpoint, see p 1046
Response by Moyé and Cohen, see p 1051
The inadequacy of P values as a singular and standard criteria for significance in the analysis of clinical trials has been known and communicated by statisticians for >35 years.5 Given the limitations of significance testing, can we simply, as suggested by Moye and Cohen, resolve to walk away from P values? Many have argued the use of methods that emphasize estimation over testing, such as confidence, credibility, or prediction intervals that can be used in place of or in addition to P values.6–8 Others have lobbied for alternative measures of evidence, such as likelihood ratios or Bayesian methodologies, which leverage a priori information about the probability of different magnitudes of effect that result from the treatment and modify the estimation of the effects based on the observed data.9–11 Still, others propose approaches such as …