论文标题

当证据和意义相撞时

When Evidence and Significance Collide

论文作者

Bartoš, František, Pawel, Samuel, Wagenmakers, Eric-Jan

论文摘要

零假设统计显着性检验(NHST)是评估随机对照试验结果的主要方法。 NHST具有长期错误率保证,但其主要推论工具($ p $ - 价值)只是对无效假设的证据的间接度量。主要原因是$ p $ - 价值基于零假设的假设是正确的,而在任何替代假设下的数据可能性都被忽略。如果目标是量化数据提供或反对零假设提供多少证据,则不可避免地要指定替代假设(Goodman&Royall,1988)。当研究人员将$ p $价值解释为证据时,就会出现悖论。例如,在无效的替代假设下,零下令人惊讶的结果可能同样令人惊讶,因此$ p = .045 $结果(`'拒绝null')并不能比以前更少合理。因此,$ p $ - 价值被认为高估了反对无原假设的证据。相反,可能是统计上不显着的结果(即$ p> .05)$提供一些证据,以支持替代假设。因此,对于研究人员而言,要知道统计学意义和证据何时相撞至关重要,这要求直接计算出与传统的$ p $价值相关的证据衡量。

Null hypothesis statistical significance testing (NHST) is the dominant approach for evaluating results from randomized controlled trials. Whereas NHST comes with long-run error rate guarantees, its main inferential tool -- the $p$-value -- is only an indirect measure of evidence against the null hypothesis. The main reason is that the $p$-value is based on the assumption the null hypothesis is true, whereas the likelihood of the data under any alternative hypothesis is ignored. If the goal is to quantify how much evidence the data provide for or against the null hypothesis it is unavoidable that an alternative hypothesis be specified (Goodman & Royall, 1988). Paradoxes arise when researchers interpret $p$-values as evidence. For instance, results that are surprising under the null may be equally surprising under a plausible alternative hypothesis, such that a $p=.045$ result (`reject the null') does not make the null any less plausible than it was before. Hence, $p$-values have been argued to overestimate the evidence against the null hypothesis. Conversely, it can be the case that statistically non-significant results (i.e., $p>.05)$ nevertheless provide some evidence in favor of the alternative hypothesis. It is therefore crucial for researchers to know when statistical significance and evidence collide, and this requires that a direct measure of evidence is computed and presented alongside the traditional $p$-value.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源