kb/Replication_crisis-10.md at 0cf396810ce28080e04e1620d7a42934037bbe21

turtle89431 0cf396810c Scrape wikipedia-science: 0 new, 780 updated, 803 total (kb-cron)

2026-05-04 20:45:21 -07:00

8.3 KiB

Raw Blame History

title	chunk	source	category	tags	date_saved	instance
Replication crisis	11/15	https://en.wikipedia.org/wiki/Replication_crisis	reference	science, encyclopedia	2026-05-05T03:45:08.741659+00:00	kb-cron

and the probability of replicating the statistical study is

    P
    r
    (
    
      replication
    
    
      |
    
    
       find 
    
    
      H
      
        1
      
    
    )
    =
    P
    r
    (
    
      find 
    
    
      H
      
        1
      
    
    
      |
    
    
      H
      
        1
      
    
    )
    P
    r
    (
    
      H
      
        1
      
    
    
      |
    
    
       find 
    
    
      H
      
        1
      
    
    )
    +
    P
    r
    (
    
      find 
    
    
      H
      
        1
      
    
    
      |
    
    
      H
      
        0
      
    
    )
    P
    r
    (
    
      H
      
        0
      
    
    
      |
    
    
       find 
    
    
      H
      
        1
      
    
    )
  

{\displaystyle Pr({\text{replication}}|{\text{ find }}H_{1})=Pr({\text{find }}H_{1}|H_{1})Pr(H_{1}|{\text{ find }}H_{1})+Pr({\text{find }}H_{1}|H_{0})Pr(H_{0}|{\text{ find }}H_{1})}

which is also different from

    P
    r
    (
    
      H
      
        1
      
    
    
      |
    
    
       find 
    
    
      H
      
        1
      
    
    )
  

{\displaystyle Pr(H_{1}|{\text{ find }}H_{1})}

. In particular, for a fixed level of significance, the probability of replication increases with power, and prior probability for

      H
      
        1
      
    
  

{\displaystyle H_{1}}

. If the prior probability for

      H
      
        1
      
    
  

{\displaystyle H_{1}}

is small, then one would require a high power for replication. For example, if the prior probability of the null hypothesis is

    P
    r
    (
    
      H
      
        0
      
    
    )
    =
    0.9
  

{\displaystyle Pr(H_{0})=0.9}

, and the study found a positive result, then the posterior probability for

      H
      
        1
      
    
  

{\displaystyle H_{1}}

    P
    r
    (
    
      H
      
        1
      
    
    
      |
    
    
       find 
    
    
      H
      
        1
      
    
    )
    =
    0.50
  

{\displaystyle Pr(H_{1}|{\text{ find }}H_{1})=0.50}

, and the replication probability is

    P
    r
    (
    
      replication
    
    
      |
    
    
       find 
    
    
      H
      
        1
      
    
    )
    =
    0.25
  

{\displaystyle Pr({\text{replication}}|{\text{ find }}H_{1})=0.25}

=== Problem with null hypothesis testing === Some argue that null hypothesis testing is itself inappropriate, especially in "soft sciences" like social psychology. As repeatedly observed by statisticians, in complex systems, such as social psychology, "the null hypothesis is always false", or "everything is correlated". If so, then if the null hypothesis is not rejected, that does not show that the null hypothesis is true, but merely that it was a false negative, typically due to low power. Low power is especially prevalent in subject areas where effect sizes are small and data is expensive to acquire, such as social psychology. Furthermore, when the null hypothesis is rejected, it might not be evidence for the substantial alternative hypothesis. In soft sciences, many hypotheses can predict a correlation between two variables. Thus, evidence against the null hypothesis "there is no correlation" is no evidence for one of the many alternative hypotheses that equally well predict "there is a correlation". Fisher developed the NHST for agronomy, where rejecting the null hypothesis is usually good proof of the alternative hypothesis, since there are not many of them. Rejecting the hypothesis "fertilizer does not help" is evidence for "fertilizer helps". But in psychology, there are many alternative hypotheses for every null hypothesis. In particular, when statistical studies on extrasensory perception reject the null hypothesis at extremely low p-value (as in the case of Daryl Bem), it does not imply the alternative hypothesis "ESP exists". Far more likely is that there was a small (non-ESP) signal in the experiment setup that has been measured precisely. Paul Meehl noted that statistical hypothesis testing is used differently in "soft" psychology (personality, social, etc.) from physics. In physics, a theory makes a quantitative prediction and is tested by checking whether the prediction falls within the statistically measured interval. In soft psychology, a theory makes a directional prediction and is tested by checking whether the null hypothesis is rejected in the right direction. Consequently, improved experimental technique makes theories more likely to be falsified in physics but less likely to be falsified in soft psychology, as the null hypothesis is always false since any two variables are correlated by a "crud factor" of about 0.30. The net effect is an accumulation of theories that remain unfalsified, but with no empirical evidence for preferring one over the others.

=== Base rate fallacy === According to philosopher Alexander Bird, a possible reason for the low rates of replicability in certain scientific fields is that a majority of tested hypotheses are false a priori. On this view, low rates of replicability could be consistent with quality science. Relatedly, the expectation that most findings should replicate would be misguided and, according to Bird, a form of base rate fallacy. Bird's argument works as follows. Assuming an ideal situation of a test of significance, whereby the probability of incorrectly rejecting the null hypothesis is 5% (i.e. Type I error) and the probability of correctly rejecting the null hypothesis is 80% (i.e. Power), in a context where a high proportion of tested hypotheses are false, it is conceivable that the number of false positives would be high compared to those of true positives. For example, in a situation where only 10% of tested hypotheses are actually true, one can calculate that as many as 36% of results will be false positives. The claim that the falsity of most tested hypotheses can explain low rates of replicability is even more relevant when considering that the average power for statistical tests in certain fields might be much lower than 80%. For example, the proportion of false positives increases to a value between 55.2% and 57.6% when calculated with the estimates of an average power between 34.1% and 36.4% for psychology studies, as provided by Stanley and colleagues in their analysis of 200 meta-analyses in the field. A high proportion of false positives would then result in many research findings being non-replicable. Bird notes that the claim that a majority of tested hypotheses are false a priori in certain scientific fields might be plausible given factors such as the complexity of the phenomena under investigation, the fact that theories are seldom undisputed, the "inferential distance" between theories and hypotheses, and the ease with which hypotheses can be generated. In this respect, the fields Bird takes as examples are clinical medicine, genetic and molecular epidemiology, and social psychology. This situation is radically different in fields where theories have outstanding empirical basis and hypotheses can be easily derived from theories (e.g., experimental physics).

8.3 KiB Raw Blame History

8.3 KiB

Raw Blame History