---
title: "Replication crisis"
chunk: 2/15
source: "https://en.wikipedia.org/wiki/Replication_crisis"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T03:45:08.741659+00:00"
instance: "kb-cron"
---

A null hypothesis test is a decision procedure which takes in some data, and outputs either 
  
    
      
        
          H
          
            0
          
        
      
    
    {\displaystyle H_{0}}
  
 or 
  
    
      
        
          H
          
            1
          
        
      
    
    {\displaystyle H_{1}}
  
. If it outputs 
  
    
      
        
          H
          
            1
          
        
      
    
    {\displaystyle H_{1}}
  
, it is usually stated as "there is a statistically significant effect" or "the null hypothesis is rejected".
Often, the statistical test is a (one-sided) threshold test, which is structured as follows:

Gather data 
  
    
      
        D
      
    
    {\displaystyle D}
  
.
Compute a test statistic 
  
    
      
        t
        [
        D
        ]
      
    
    {\displaystyle t[D]}
  
 for the data.
Compare the test statistic against a critical value/threshold 
  
    
      
        
          t
          
            threshold
          
        
      
    
    {\displaystyle t_{\text{threshold}}}
  
. If 
  
    
      
        t
        [
        D
        ]
        >
        
          t
          
            threshold
          
        
      
    
    {\displaystyle t[D]>t_{\text{threshold}}}
  
, then output 
  
    
      
        
          H
          
            1
          
        
      
    
    {\displaystyle H_{1}}
  
, else, output 
  
    
      
        
          H
          
            0
          
        
      
    
    {\displaystyle H_{0}}
  
.
A two-sided threshold test is similar, but with two thresholds, such that it outputs 
  
    
      
        
          H
          
            1
          
        
      
    
    {\displaystyle H_{1}}
  
 if either 
  
    
      
        t
        [
        D
        ]
        <
        
          t
          
            threshold
          
          
            −
          
        
      
    
    {\displaystyle t[D]<t_{\text{threshold}}^{-}}
  
 or 
  
    
      
        t
        [
        D
        ]
        >
        
          t
          
            threshold
          
          
            +
          
        
      
    
    {\displaystyle t[D]>t_{\text{threshold}}^{+}}
  

There are 4 possible outcomes of a null hypothesis test: false negative, true negative, false positive, true positive. A false negative means that 
  
    
      
        
          H
          
            0
          
        
      
    
    {\displaystyle H_{0}}
  
 is true, but the test outcome is 
  
    
      
        
          H
          
            1
          
        
      
    
    {\displaystyle H_{1}}
  
; a true negative means that 
  
    
      
        
          H
          
            0
          
        
      
    
    {\displaystyle H_{0}}
  
 is true, and the test outcome is 
  
    
      
        
          H
          
            0
          
        
      
    
    {\displaystyle H_{0}}
  
, etc.

Significance level, false positive rate, or the alpha level, is the probability of finding the alternative to be true when the null hypothesis is true:
  
    
      
        (
        
          significance
        
        )
        :=
        α
        :=
        P
        r
        (
        
          find 
        
        
          H
          
            1
          
        
        
          |
        
        
          H
          
            0
          
        
        )
      
    
    {\displaystyle ({\text{significance}}):=\alpha :=Pr({\text{find }}H_{1}|H_{0})}
  
For example, when the test is a one-sided threshold test, then 
  
    
      
        α
        =
        P
        
          r
          
            D
            ∼
            
              H
              
                0
              
            
          
        
        (
        t
        [
        D
        ]
        >
        
          t
          
            threshold
          
        
        )
      
    
    {\displaystyle \alpha =Pr_{D\sim H_{0}}(t[D]>t_{\text{threshold}})}
  
 where 
  
    
      
        D
        ∼
        
          H
          
            0
          
        
      
    
    {\displaystyle D\sim H_{0}}
  
 means "the data is sampled from 
  
    
      
        
          H
          
            0
          
        
      
    
    {\displaystyle H_{0}}
  
".
Statistical power, true positive rate, is the probability of finding the alternative to be true when the alternative hypothesis is true:
  
    
      
        (
        
          power
        
        )
        :=
        1
        −
        β
        :=
        P
        r
        (
        
          find 
        
        
          H
          
            1
          
        
        
          |
        
        
          H
          
            1
          
        
        )
      
    
    {\displaystyle ({\text{power}}):=1-\beta :=Pr({\text{find }}H_{1}|H_{1})}
  
where 
  
    
      
        β
      
    
    {\displaystyle \beta }
  
 is also called the false negative rate. For example, when the test is a one-sided threshold test, then 
  
    
      
        1
        −
        β
        =
        P
        
          r
          
            D
            ∼
            
              H
              
                1
              
            
          
        
        (
        t
        [
        D
        ]
        >
        
          t
          
            threshold
          
        
        )
      
    
    {\displaystyle 1-\beta =Pr_{D\sim H_{1}}(t[D]>t_{\text{threshold}})}
  
.
Given a statistical test and a data set 
  
    
      
        D
      
    
    {\displaystyle D}
  
, the corresponding p-value is the probability that the test statistic is at least as extreme, conditional on 
  
    
      
        
          H
          
            0
          
        
      
    
    {\displaystyle H_{0}}
  
. For example, for a one-sided threshold test, 
  
    
      
        p
        [
        D
        ]
        =
        P
        
          r
          
            
              D
              ′
            
            ∼
            
              H
              
                0
              
            
          
        
        (
        t
        [
        
          D
          ′
        
        ]
        >
        t
        [
        D
        ]
        )
      
    
    {\displaystyle p[D]=Pr_{D'\sim H_{0}}(t[D']>t[D])}
  
If the null hypothesis is true, then the p-value is distributed uniformly on 
  
    
      
        [
        0
        ,
        1
        ]
      
    
    {\displaystyle [0,1]}
  
. Otherwise, it is typically peaked at 
  
    
      
        p
        =
        0.0
      
    
    {\displaystyle p=0.0}
  
 and roughly exponential, though the precise shape of the p-value distribution depends on what the alternative hypothesis is.
Because the p-values are distributed uniformly on 
  
    
      
        [
        0
        ,
        1
        ]
      
    
    {\displaystyle [0,1]}
  
 under the null hypothesis, researchers can set any significance level 
  
    
      
        α
      
    
    {\displaystyle \alpha }
  
 by computing the p-value, then output 
  
    
      
        
          H
          
            1
          
        
      
    
    {\displaystyle H_{1}}
  
 if 
  
    
      
        p
        [
        D
        ]
        <
        α
      
    
    {\displaystyle p[D]<\alpha }
  
. This is usually stated as "the null hypothesis is rejected at significance level 
  
    
      
        α
      
    
    {\displaystyle \alpha }
  
", or "
  
    
      
        
          H
          
            1
          
        
        
        (
        p
        <
        α
        )
      
    
    {\displaystyle H_{1}\;(p<\alpha )}
  
", such as "smoking is correlated with cancer (p < 0.001)".

=== History ===
The replication crisis dates to a number of events in the early 2010s. Felipe Romero identified four precursors to the crisis: