8.2 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Replication crisis | 2/15 | https://en.wikipedia.org/wiki/Replication_crisis | reference | science, encyclopedia | 2026-05-05T03:17:03.520061+00:00 | kb-cron |
A null hypothesis test is a decision procedure which takes in some data, and outputs either
H
0
{\displaystyle H_{0}}
or
H
1
{\displaystyle H_{1}}
. If it outputs
H
1
{\displaystyle H_{1}}
, it is usually stated as "there is a statistically significant effect" or "the null hypothesis is rejected". Often, the statistical test is a (one-sided) threshold test, which is structured as follows:
Gather data
D
{\displaystyle D}
. Compute a test statistic
t
[
D
]
{\displaystyle t[D]}
for the data. Compare the test statistic against a critical value/threshold
t
threshold
{\displaystyle t_{\text{threshold}}}
. If
t
[
D
]
>
t
threshold
{\displaystyle t[D]>t_{\text{threshold}}}
, then output
H
1
{\displaystyle H_{1}}
, else, output
H
0
{\displaystyle H_{0}}
. A two-sided threshold test is similar, but with two thresholds, such that it outputs
H
1
{\displaystyle H_{1}}
if either
t
[
D
]
<
t
threshold
−
{\displaystyle t[D]<t_{\text{threshold}}^{-}}
or
t
[
D
]
>
t
threshold
+
{\displaystyle t[D]>t_{\text{threshold}}^{+}}
There are 4 possible outcomes of a null hypothesis test: false negative, true negative, false positive, true positive. A false negative means that
H
0
{\displaystyle H_{0}}
is true, but the test outcome is
H
1
{\displaystyle H_{1}}
; a true negative means that
H
0
{\displaystyle H_{0}}
is true, and the test outcome is
H
0
{\displaystyle H_{0}}
, etc.
Significance level, false positive rate, or the alpha level, is the probability of finding the alternative to be true when the null hypothesis is true:
(
significance
)
:=
α
:=
P
r
(
find
H
1
|
H
0
)
{\displaystyle ({\text{significance}}):=\alpha :=Pr({\text{find }}H_{1}|H_{0})}
For example, when the test is a one-sided threshold test, then
α
=
P
r
D
∼
H
0
(
t
[
D
]
>
t
threshold
)
{\displaystyle \alpha =Pr_{D\sim H_{0}}(t[D]>t_{\text{threshold}})}
where
D
∼
H
0
{\displaystyle D\sim H_{0}}
means "the data is sampled from
H
0
{\displaystyle H_{0}}
". Statistical power, true positive rate, is the probability of finding the alternative to be true when the alternative hypothesis is true:
(
power
)
:=
1
−
β
:=
P
r
(
find
H
1
|
H
1
)
{\displaystyle ({\text{power}}):=1-\beta :=Pr({\text{find }}H_{1}|H_{1})}
where
β
{\displaystyle \beta }
is also called the false negative rate. For example, when the test is a one-sided threshold test, then
1
−
β
=
P
r
D
∼
H
1
(
t
[
D
]
>
t
threshold
)
{\displaystyle 1-\beta =Pr_{D\sim H_{1}}(t[D]>t_{\text{threshold}})}
. Given a statistical test and a data set
D
{\displaystyle D}
, the corresponding p-value is the probability that the test statistic is at least as extreme, conditional on
H
0
{\displaystyle H_{0}}
. For example, for a one-sided threshold test,
p
[
D
]
=
P
r
D
′
∼
H
0
(
t
[
D
′
]
>
t
[
D
]
)
{\displaystyle p[D]=Pr_{D'\sim H_{0}}(t[D']>t[D])}
If the null hypothesis is true, then the p-value is distributed uniformly on
[
0
,
1
]
{\displaystyle [0,1]}
. Otherwise, it is typically peaked at
p
=
0.0
{\displaystyle p=0.0}
and roughly exponential, though the precise shape of the p-value distribution depends on what the alternative hypothesis is. Because the p-values are distributed uniformly on
[
0
,
1
]
{\displaystyle [0,1]}
under the null hypothesis, researchers can set any significance level
α
{\displaystyle \alpha }
by computing the p-value, then output
H
1
{\displaystyle H_{1}}
if
p
[
D
]
<
α
{\displaystyle p[D]<\alpha }
. This is usually stated as "the null hypothesis is rejected at significance level
α
{\displaystyle \alpha }
", or "
H
1
(
p
<
α
)
{\displaystyle H_{1}\;(p<\alpha )}
", such as "smoking is correlated with cancer (p < 0.001)".
=== History === The replication crisis dates to a number of events in the early 2010s. Felipe Romero identified four precursors to the crisis: