kb/data/en.wikipedia.org/wiki/Invalid_science-1.md

5.8 KiB

title chunk source category tags date_saved instance
Invalid science 2/3 https://en.wikipedia.org/wiki/Invalid_science reference science, encyclopedia 2026-05-05T04:28:14.614270+00:00 kb-cron

=== Peer review === Peer review is the primary validation technique employed by scientific publications. However, a prominent medical journal tested the system and found major failings. It supplied research with induced errors and found that most reviewers failed to spot the mistakes, even after being told of the tests. A pseudonymous fabricated paper on the effects of a chemical derived from lichen on cancer cells was submitted to 304 journals for peer review. The paper was filled with errors of study design, analysis and interpretation. 157 lower-rated journals accepted it. Another study sent an article containing eight deliberate mistakes in study design, analysis and interpretation to more than 200 of the British Medical Journal's regular reviewers. On average, they reported fewer than two of the problems. Peer reviewers typically do not re-analyze data from scratch, checking only that the authors' analysis is properly conceived.

=== Statistics ===

==== Type I and type II errors ==== Scientists divide errors into type I, incorrectly asserting the truth of a hypothesis (false positive) and type II, rejecting a correct hypothesis (false negative). Statistical checks assess the probability that data which seem to support a hypothesis come about simply by chance. If the probability is less than 5%, the evidence is rated "statistically significant". One definitional consequence is a type one error rate of one in 20.

==== Statistical power ==== In 2005 Stanford epidemiologist John Ioannidis showed that the idea that only one paper in 20 gives a false-positive result was incorrect. He claimed, "most published research findings are probably false." He found three categories of problems: insufficient "statistical power" (avoiding type II errors); the unlikeliness of the hypothesis; and publication bias favoring novel claims. A statistically powerful study identifies factors with only small effects on data. In general studies with more repetitions that run the experiment more times on more subjects have greater power. A power of 0.8 means that of ten true hypotheses tested, the effects of two are missed. Ioannidis found that in neuroscience the typical statistical power is 0.21; another study found that psychology studies average 0.35. Unlikeliness is a measure of the degree of surprise in a result. Scientists prefer surprising results, leading them to test hypotheses that are unlikely to very unlikely. Ioannidis claimed that in epidemiology, some one in ten hypotheses should be true. In exploratory disciplines like genomics, which rely on examining voluminous data about genes and proteins, only one in a thousand should prove correct. In a discipline in which 100 out of 1,000 hypotheses are true, studies with a power of 0.8 will find 80 and miss 20. Of the 900 incorrect hypotheses, 5% or 45 will be accepted because of type I errors. Adding the 45 false positives to the 80 true positives gives 125 positive results, or 36% specious. Dropping statistical power to 0.4, optimistic for many fields, would still produce 45 false positives but only 40 true positives, less than half. Negative results are more reliable. Statistical power of 0.8 produces 875 negative results of which only 20 are false, giving an accuracy of over 97%. Negative results however account for a minority of published results, varying by discipline. A study of 4,600 papers found that the proportion of published negative results dropped from 30% to 14% between 1990 and 2007. Subatomic physics sets an acceptable false-positive rate of one in 3.5m (known as the five-sigma standard). However, even this does not provide perfect protection. The problem invalidates some 3/4s of machine learning studies according to one review.

==== Statistical significance ==== Statistical significance is a measure for testing statistical correlation. It was invented by English mathematician Ronald Fisher in the 1920s. It defines a "significant" result as any data point that would be produced by chance less than 5% (or more stringently, 1%) of the time. A significant result is widely seen as an important indicator that the correlation is not random. While correlations track the relationship between truly independent measurements, such as smoking and cancer, they are much less effective when variables cannot be isolated, a common circumstance in biological systems. For example, statistics found a high correlation between lower back pain and abnormalities in spinal discs, although it was later discovered that serious abnormalities were present in two-thirds of pain-free patients.

=== Minimum threshold publishers === Journals such as PLoS One use a "minimal-threshold" standard, seeking to publish as much science as possible, rather than to pick out the best work. Their peer reviewers assess only whether a paper is methodologically sound. Almost half of their submissions are still rejected on that basis.

=== Unpublished research === Only 22% of the clinical trials financed by the National Institutes of Health (NIH) released summary results within one year of completion, even though the NIH requires it. Fewer than half published within 30 months; a third remained unpublished after 51 months. When other scientists rely on invalid research, they may waste time on lines of research that are themselves invalid. The failure to report failures means that researchers waste money and effort exploring blind alleys already investigated by other scientists.

=== Fraud === In 21 surveys of academics (mostly in the biomedical sciences but also in civil engineering, chemistry and economics) carried out between 1987 and 2008, 2% admitted fabricating data, but 28% claimed to know of colleagues who engaged in questionable research practices.