kb/data/en.wikipedia.org/wiki/Replication_crisis-1.md

---
title: "Replication crisis"
chunk: 2/15
source: "https://en.wikipedia.org/wiki/Replication_crisis"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T03:17:03.520061+00:00"
instance: "kb-cron"
---

A null hypothesis test is a decision procedure which takes in some data, and outputs either


          H

            0


    {\displaystyle H_{0}}

 or


          H

            1


    {\displaystyle H_{1}}

. If it outputs


          H

            1


    {\displaystyle H_{1}}

, it is usually stated as "there is a statistically significant effect" or "the null hypothesis is rejected".
Often, the statistical test is a (one-sided) threshold test, which is structured as follows:

Gather data


        D


    {\displaystyle D}

.
Compute a test statistic


        t
        [
        D
        ]


    {\displaystyle t[D]}

 for the data.
Compare the test statistic against a critical value/threshold


          t

            threshold


    {\displaystyle t_{\text{threshold}}}

. If


        t
        [
        D
        ]
        >

          t

            threshold


    {\displaystyle t[D]>t_{\text{threshold}}}

, then output


          H

            1


    {\displaystyle H_{1}}

, else, output


          H

            0


    {\displaystyle H_{0}}

.
A two-sided threshold test is similar, but with two thresholds, such that it outputs


          H

            1


    {\displaystyle H_{1}}

 if either


        t
        [
        D
        ]
        <

          t

            threshold


            −


    {\displaystyle t[D]<t_{\text{threshold}}^{-}}

 or


        t
        [
        D
        ]
        >

          t

            threshold


            +


    {\displaystyle t[D]>t_{\text{threshold}}^{+}}


There are 4 possible outcomes of a null hypothesis test: false negative, true negative, false positive, true positive. A false negative means that


          H

            0


    {\displaystyle H_{0}}

 is true, but the test outcome is


          H

            1


    {\displaystyle H_{1}}

; a true negative means that


          H

            0


    {\displaystyle H_{0}}

 is true, and the test outcome is


          H

            0


    {\displaystyle H_{0}}

, etc.

Significance level, false positive rate, or the alpha level, is the probability of finding the alternative to be true when the null hypothesis is true:


        (

          significance

        )
        :=
        α
        :=
        P
        r
        (

          find


          H

            1


          |


          H

            0


        )


    {\displaystyle ({\text{significance}}):=\alpha :=Pr({\text{find }}H_{1}|H_{0})}

For example, when the test is a one-sided threshold test, then


        α
        =
        P

          r

            D
            ∼

              H

                0


        (
        t
        [
        D
        ]
        >

          t

            threshold


        )


    {\displaystyle \alpha =Pr_{D\sim H_{0}}(t[D]>t_{\text{threshold}})}

 where


        D
        ∼

          H

            0


    {\displaystyle D\sim H_{0}}

 means "the data is sampled from


          H

            0


    {\displaystyle H_{0}}

".
Statistical power, true positive rate, is the probability of finding the alternative to be true when the alternative hypothesis is true:


        (

          power

        )
        :=
        1
        −
        β
        :=
        P
        r
        (

          find


          H

            1


          |


          H

            1


        )


    {\displaystyle ({\text{power}}):=1-\beta :=Pr({\text{find }}H_{1}|H_{1})}

where


        β


    {\displaystyle \beta }

 is also called the false negative rate. For example, when the test is a one-sided threshold test, then


        1
        −
        β
        =
        P

          r

            D
            ∼

              H

                1


        (
        t
        [
        D
        ]
        >

          t

            threshold


        )


    {\displaystyle 1-\beta =Pr_{D\sim H_{1}}(t[D]>t_{\text{threshold}})}

.
Given a statistical test and a data set


        D


    {\displaystyle D}

, the corresponding p-value is the probability that the test statistic is at least as extreme, conditional on


          H

            0


    {\displaystyle H_{0}}

. For example, for a one-sided threshold test,


        p
        [
        D
        ]
        =
        P

          r


              D
              ′

            ∼

              H

                0


        (
        t
        [

          D
          ′

        ]
        >
        t
        [
        D
        ]
        )


    {\displaystyle p[D]=Pr_{D'\sim H_{0}}(t[D']>t[D])}

If the null hypothesis is true, then the p-value is distributed uniformly on


        [
        0
        ,
        1
        ]


    {\displaystyle [0,1]}

. Otherwise, it is typically peaked at


        p
        =
        0.0


    {\displaystyle p=0.0}

 and roughly exponential, though the precise shape of the p-value distribution depends on what the alternative hypothesis is.
Because the p-values are distributed uniformly on


        [
        0
        ,
        1
        ]


    {\displaystyle [0,1]}

 under the null hypothesis, researchers can set any significance level


        α


    {\displaystyle \alpha }

 by computing the p-value, then output


          H

            1


    {\displaystyle H_{1}}

 if


        p
        [
        D
        ]
        <
        α


    {\displaystyle p[D]<\alpha }

. This is usually stated as "the null hypothesis is rejected at significance level


        α


    {\displaystyle \alpha }

", or "


          H

            1


        (
        p
        <
        α
        )


    {\displaystyle H_{1}\;(p<\alpha )}

", such as "smoking is correlated with cancer (p < 0.001)".

=== History ===
The replication crisis dates to a number of events in the early 2010s. Felipe Romero identified four precursors to the crisis: