kb/Replication_crisis-1.md at 0db041748cd021fbcfafd99409a6bbc66b175cbf

turtle89431 2efe0d5f23 Scrape wikipedia-science: 750 new, 30 updated, 803 total (kb-cron)

2026-05-04 20:17:13 -07:00

8.2 KiB

Raw Blame History

title	chunk	source	category	tags	date_saved	instance
Replication crisis	2/15	https://en.wikipedia.org/wiki/Replication_crisis	reference	science, encyclopedia	2026-05-05T03:17:03.520061+00:00	kb-cron

A null hypothesis test is a decision procedure which takes in some data, and outputs either

      H
      
        0
      
    
  

{\displaystyle H_{0}}

      H
      
        1
      
    
  

{\displaystyle H_{1}}

. If it outputs

      H
      
        1
      
    
  

{\displaystyle H_{1}}

, it is usually stated as "there is a statistically significant effect" or "the null hypothesis is rejected". Often, the statistical test is a (one-sided) threshold test, which is structured as follows:

Gather data

    D
  

{\displaystyle D}

. Compute a test statistic

    t
    [
    D
    ]
  

{\displaystyle t[D]}

for the data. Compare the test statistic against a critical value/threshold

      t
      
        threshold
      
    
  

{\displaystyle t_{\text{threshold}}}

. If

    t
    [
    D
    ]
    >
    
      t
      
        threshold
      
    
  

{\displaystyle t[D]>t_{\text{threshold}}}

, then output

      H
      
        1
      
    
  

{\displaystyle H_{1}}

, else, output

      H
      
        0
      
    
  

{\displaystyle H_{0}}

. A two-sided threshold test is similar, but with two thresholds, such that it outputs

      H
      
        1
      
    
  

{\displaystyle H_{1}}

if either

    t
    [
    D
    ]
    <
    
      t
      
        threshold
      
      
        −
      
    
  

{\displaystyle t[D]<t_{\text{threshold}}^{-}}

    t
    [
    D
    ]
    >
    
      t
      
        threshold
      
      
        +
      
    
  

{\displaystyle t[D]>t_{\text{threshold}}^{+}}

There are 4 possible outcomes of a null hypothesis test: false negative, true negative, false positive, true positive. A false negative means that

      H
      
        0
      
    
  

{\displaystyle H_{0}}

is true, but the test outcome is

      H
      
        1
      
    
  

{\displaystyle H_{1}}

; a true negative means that

      H
      
        0
      
    
  

{\displaystyle H_{0}}

is true, and the test outcome is

      H
      
        0
      
    
  

{\displaystyle H_{0}}

, etc.

Significance level, false positive rate, or the alpha level, is the probability of finding the alternative to be true when the null hypothesis is true:

    (
    
      significance
    
    )
    :=
    α
    :=
    P
    r
    (
    
      find 
    
    
      H
      
        1
      
    
    
      |
    
    
      H
      
        0
      
    
    )
  

{\displaystyle ({\text{significance}}):=\alpha :=Pr({\text{find }}H_{1}|H_{0})}

For example, when the test is a one-sided threshold test, then

    α
    =
    P
    
      r
      
        D
        ∼
        
          H
          
            0
          
        
      
    
    (
    t
    [
    D
    ]
    >
    
      t
      
        threshold
      
    
    )
  

{\displaystyle \alpha =Pr_{D\sim H_{0}}(t[D]>t_{\text{threshold}})}

where

    D
    ∼
    
      H
      
        0
      
    
  

{\displaystyle D\sim H_{0}}

means "the data is sampled from

      H
      
        0
      
    
  

{\displaystyle H_{0}}

". Statistical power, true positive rate, is the probability of finding the alternative to be true when the alternative hypothesis is true:

    (
    
      power
    
    )
    :=
    1
    −
    β
    :=
    P
    r
    (
    
      find 
    
    
      H
      
        1
      
    
    
      |
    
    
      H
      
        1
      
    
    )
  

{\displaystyle ({\text{power}}):=1-\beta :=Pr({\text{find }}H_{1}|H_{1})}

where

    β
  

{\displaystyle \beta }

is also called the false negative rate. For example, when the test is a one-sided threshold test, then

    1
    −
    β
    =
    P
    
      r
      
        D
        ∼
        
          H
          
            1
          
        
      
    
    (
    t
    [
    D
    ]
    >
    
      t
      
        threshold
      
    
    )
  

{\displaystyle 1-\beta =Pr_{D\sim H_{1}}(t[D]>t_{\text{threshold}})}

. Given a statistical test and a data set

    D
  

{\displaystyle D}

, the corresponding p-value is the probability that the test statistic is at least as extreme, conditional on

      H
      
        0
      
    
  

{\displaystyle H_{0}}

. For example, for a one-sided threshold test,

    p
    [
    D
    ]
    =
    P
    
      r
      
        
          D
          ′
        
        ∼
        
          H
          
            0
          
        
      
    
    (
    t
    [
    
      D
      ′
    
    ]
    >
    t
    [
    D
    ]
    )
  

{\displaystyle p[D]=Pr_{D'\sim H_{0}}(t[D']>t[D])}

If the null hypothesis is true, then the p-value is distributed uniformly on

    [
    0
    ,
    1
    ]
  

{\displaystyle [0,1]}

. Otherwise, it is typically peaked at

    p
    =
    0.0
  

{\displaystyle p=0.0}

and roughly exponential, though the precise shape of the p-value distribution depends on what the alternative hypothesis is. Because the p-values are distributed uniformly on

    [
    0
    ,
    1
    ]
  

{\displaystyle [0,1]}

under the null hypothesis, researchers can set any significance level

    α
  

{\displaystyle \alpha }

by computing the p-value, then output

      H
      
        1
      
    
  

{\displaystyle H_{1}}

    p
    [
    D
    ]
    <
    α
  

{\displaystyle p[D]<\alpha }

. This is usually stated as "the null hypothesis is rejected at significance level

    α
  

{\displaystyle \alpha }

", or "

      H
      
        1
      
    
    
    (
    p
    <
    α
    )
  

{\displaystyle H_{1}\;(p<\alpha )}

", such as "smoking is correlated with cancer (p < 0.001)".

=== History === The replication crisis dates to a number of events in the early 2010s. Felipe Romero identified four precursors to the crisis:

8.2 KiB Raw Blame History Unescape Escape

8.2 KiB

Raw Blame History