kb/data/en.wikipedia.org/wiki/History_of_statistics-5.md

5.2 KiB

title chunk source category tags date_saved instance
History of statistics 6/8 https://en.wikipedia.org/wiki/History_of_statistics reference science, encyclopedia 2026-05-05T04:00:26.751121+00:00 kb-cron

Galton's publication of Natural Inheritance in 1889 sparked the interest of a brilliant mathematician, Karl Pearson, then working at University College London, and he went on to found the discipline of mathematical statistics. He emphasised the statistical foundation of scientific laws and promoted its study and his laboratory attracted students from around the world attracted by his new methods of analysis, including Udny Yule. His work grew to encompass the fields of biology, epidemiology, anthropometry, medicine and social history. In 1901, with Walter Weldon, founder of biometry, and Galton, he founded the journal Biometrika as the first journal of mathematical statistics and biometry. His work, and that of Galton, underpins many of the 'classical' statistical methods which are in common use today, including the Correlation coefficient, defined as a product-moment; the method of moments for the fitting of distributions to samples; Pearson's system of continuous curves that forms the basis of the now conventional continuous probability distributions; Chi distance a precursor and special case of the Mahalanobis distance and P-value, defined as the probability measure of the complement of the ball with the hypothesized value as center point and chi distance as radius. He also introduced the term 'standard deviation'. He also founded the statistical hypothesis testing theory, Pearson's chi-squared test and principal component analysis. In 1911 he founded the world's first university statistics department at University College London. The second wave of mathematical statistics was pioneered by Ronald Fisher who wrote two textbooks, Statistical Methods for Research Workers, published in 1925 and The Design of Experiments in 1935, that were to define the academic discipline in universities around the world. He also systematized previous results, putting them on a firm mathematical footing. In his 1918 seminal paper The Correlation between Relatives on the Supposition of Mendelian Inheritance, the first use to use the statistical term, variance. In 1919, at Rothamsted Experimental Station he started a major study of the extensive collections of data recorded over many years. This resulted in a series of reports under the general title Studies in Crop Variation. In 1930 he published The Genetical Theory of Natural Selection where he applied statistics to evolution. Over the next seven years, he pioneered the principles of the design of experiments (see below) and elaborated his studies of analysis of variance. He furthered his studies of the statistics of small samples. Perhaps even more important, he began his systematic approach of the analysis of real data as the springboard for the development of new statistical methods. He developed computational algorithms for analyzing data from his balanced experimental designs. In 1925, this work resulted in the publication of his first book, Statistical Methods for Research Workers. This book went through many editions and translations in later years, and it became the standard reference work for scientists in many disciplines. In 1935, this book was followed by The Design of Experiments, which was also widely used. In addition to analysis of variance, Fisher named and promoted the method of maximum likelihood estimation. Fisher also originated the concepts of sufficiency, ancillary statistics, Fisher's linear discriminator and Fisher information. His article On a distribution yielding the error functions of several well known statistics (1924) presented Pearson's chi-squared test and William Sealy Gosset's t in the same framework as the Gaussian distribution, and his own parameter in the analysis of variance Fisher's z-distribution (more commonly used decades later in the form of the F distribution). The 5% level of significance appears to have been introduced by Fisher in 1925. Fisher stated that deviations exceeding twice the standard deviation are regarded as significant. Before this deviations exceeding three times the probable error were considered significant. For a symmetrical distribution the probable error is half the interquartile range. For a normal distribution the probable error is approximately 2/3 the standard deviation. It appears that Fisher's 5% criterion was rooted in previous practice. Other important contributions at this time included Charles Spearman's rank correlation coefficient that was a useful extension of the Pearson correlation coefficient. William Sealy Gosset, the English statistician better known under his pseudonym of Student, introduced Student's t-distribution, a continuous probability distribution useful in situations where the sample size is small and population standard deviation is unknown. Egon Pearson (Karl's son) and Jerzy Neyman introduced the concepts of "Type II" error, power of a test and confidence intervals. Jerzy Neyman in 1934 showed that stratified random sampling was in general a better method of estimation than purposive (quota) sampling.

== Design of experiments ==