kb/data/en.wikipedia.org/wiki/Replication_crisis-1.md

627 lines
8.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Replication crisis"
chunk: 2/15
source: "https://en.wikipedia.org/wiki/Replication_crisis"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T03:45:08.741659+00:00"
instance: "kb-cron"
---
A null hypothesis test is a decision procedure which takes in some data, and outputs either
H
0
{\displaystyle H_{0}}
or
H
1
{\displaystyle H_{1}}
. If it outputs
H
1
{\displaystyle H_{1}}
, it is usually stated as "there is a statistically significant effect" or "the null hypothesis is rejected".
Often, the statistical test is a (one-sided) threshold test, which is structured as follows:
Gather data
D
{\displaystyle D}
.
Compute a test statistic
t
[
D
]
{\displaystyle t[D]}
for the data.
Compare the test statistic against a critical value/threshold
t
threshold
{\displaystyle t_{\text{threshold}}}
. If
t
[
D
]
>
t
threshold
{\displaystyle t[D]>t_{\text{threshold}}}
, then output
H
1
{\displaystyle H_{1}}
, else, output
H
0
{\displaystyle H_{0}}
.
A two-sided threshold test is similar, but with two thresholds, such that it outputs
H
1
{\displaystyle H_{1}}
if either
t
[
D
]
<
t
threshold
{\displaystyle t[D]<t_{\text{threshold}}^{-}}
or
t
[
D
]
>
t
threshold
+
{\displaystyle t[D]>t_{\text{threshold}}^{+}}
There are 4 possible outcomes of a null hypothesis test: false negative, true negative, false positive, true positive. A false negative means that
H
0
{\displaystyle H_{0}}
is true, but the test outcome is
H
1
{\displaystyle H_{1}}
; a true negative means that
H
0
{\displaystyle H_{0}}
is true, and the test outcome is
H
0
{\displaystyle H_{0}}
, etc.
Significance level, false positive rate, or the alpha level, is the probability of finding the alternative to be true when the null hypothesis is true:
(
significance
)
:=
α
:=
P
r
(
find
H
1
|
H
0
)
{\displaystyle ({\text{significance}}):=\alpha :=Pr({\text{find }}H_{1}|H_{0})}
For example, when the test is a one-sided threshold test, then
α
=
P
r
D
H
0
(
t
[
D
]
>
t
threshold
)
{\displaystyle \alpha =Pr_{D\sim H_{0}}(t[D]>t_{\text{threshold}})}
where
D
H
0
{\displaystyle D\sim H_{0}}
means "the data is sampled from
H
0
{\displaystyle H_{0}}
".
Statistical power, true positive rate, is the probability of finding the alternative to be true when the alternative hypothesis is true:
(
power
)
:=
1
β
:=
P
r
(
find
H
1
|
H
1
)
{\displaystyle ({\text{power}}):=1-\beta :=Pr({\text{find }}H_{1}|H_{1})}
where
β
{\displaystyle \beta }
is also called the false negative rate. For example, when the test is a one-sided threshold test, then
1
β
=
P
r
D
H
1
(
t
[
D
]
>
t
threshold
)
{\displaystyle 1-\beta =Pr_{D\sim H_{1}}(t[D]>t_{\text{threshold}})}
.
Given a statistical test and a data set
D
{\displaystyle D}
, the corresponding p-value is the probability that the test statistic is at least as extreme, conditional on
H
0
{\displaystyle H_{0}}
. For example, for a one-sided threshold test,
p
[
D
]
=
P
r
D
H
0
(
t
[
D
]
>
t
[
D
]
)
{\displaystyle p[D]=Pr_{D'\sim H_{0}}(t[D']>t[D])}
If the null hypothesis is true, then the p-value is distributed uniformly on
[
0
,
1
]
{\displaystyle [0,1]}
. Otherwise, it is typically peaked at
p
=
0.0
{\displaystyle p=0.0}
and roughly exponential, though the precise shape of the p-value distribution depends on what the alternative hypothesis is.
Because the p-values are distributed uniformly on
[
0
,
1
]
{\displaystyle [0,1]}
under the null hypothesis, researchers can set any significance level
α
{\displaystyle \alpha }
by computing the p-value, then output
H
1
{\displaystyle H_{1}}
if
p
[
D
]
<
α
{\displaystyle p[D]<\alpha }
. This is usually stated as "the null hypothesis is rejected at significance level
α
{\displaystyle \alpha }
", or "
H
1
(
p
<
α
)
{\displaystyle H_{1}\;(p<\alpha )}
", such as "smoking is correlated with cancer (p < 0.001)".
=== History ===
The replication crisis dates to a number of events in the early 2010s. Felipe Romero identified four precursors to the crisis: