kb/Analysis_of_variance-2.md at 29be3c3aba224e6717ffd8df6cbbabf7ccd962d3

turtle89431 292594baa5 Scrape wikipedia-science: 5966 new, 3181 updated, 9417 total (kb-cron)

2026-05-05 02:49:05 -07:00

9.2 KiB

Raw Blame History

title	chunk	source	category	tags	date_saved	instance
Analysis of variance	3/7	https://en.wikipedia.org/wiki/Analysis_of_variance	reference	science, encyclopedia	2026-05-05T09:48:53.349210+00:00	kb-cron

The normal-model based ANOVA analysis assumes the independence, normality, and homogeneity of variances of the residuals. The randomization-based analysis assumes only the homogeneity of the variances of the residuals (as a consequence of unit-treatment additivity) and uses the randomization procedure of the experiment. Both these analyses require homoscedasticity, as an assumption for the normal-model analysis and as a consequence of randomization and additivity for the randomization-based analysis. However, studies of processes that change variances rather than means (called dispersion effects) have been successfully conducted using ANOVA. There are no necessary assumptions for ANOVA in its full generality, but the F-test used for ANOVA hypothesis testing has assumptions and practical limitations which are of continuing interest. Problems which do not satisfy the assumptions of ANOVA can often be transformed to satisfy the assumptions. The property of unit-treatment additivity is not invariant under a "change of scale", so statisticians often use transformations to achieve unit-treatment additivity. If the response variable is expected to follow a parametric family of probability distributions, then the statistician may specify (in the protocol for the experiment or observational study) that the responses be transformed to stabilize the variance. Also, a statistician may specify that logarithmic transforms be applied to the responses which are believed to follow a multiplicative model. According to Cauchy's functional equation theorem, the logarithm is the only continuous transformation that relates multiplication operations to addition operations over the real numbers.

== Characteristics == ANOVA is used in the analysis of comparative experiments, those in which only the difference in outcomes is of interest. The statistical significance of the experiment is determined by a ratio of two variances. This ratio is independent of several possible alterations to the experimental observations: Adding a constant to all observations does not alter significance. Multiplying all observations by a constant does not alter significance. So ANOVA statistical significance result is independent of constant bias and scaling errors as well as the units used in expressing observations. In the era of mechanical calculation it was common to subtract a constant from all observations (when equivalent to dropping leading digits) to simplify data entry. This is an example of data coding.

== Algorithm == The calculations of ANOVA can be characterized as computing a number of means and variances, dividing two variances and comparing the ratio to a handbook value to determine statistical significance. Calculating a treatment effect is then trivial: "the effect of any treatment is estimated by taking the difference between the mean of the observations which receive the treatment and the general mean".

=== Partitioning of the sum of squares ===

ANOVA uses traditional standardized terminology. The definitional equation of sample variance is

      s
      
        2
      
    
    =
    
      
        1
        
          n
          −
          1
        
      
    
    
      ∑
      
        i
      
    
    (
    
      y
      
        i
      
    
    −
    
      
        
          y
          ¯
        
      
    
    
      )
      
        2
      
    
  

{\textstyle s^{2}={\frac {1}{n-1}}\sum _{i}(y_{i}-{\bar {y}})^{2}}

, where the divisor is called the degrees of freedom (DF), the summation is called the sum of squares (SS), the result is called the mean square (MS) and the squared terms are deviations from the sample mean. ANOVA estimates 3 sample variances: a total variance based on all the observation deviations from the grand mean, an error variance based on all the observation deviations from their appropriate treatment means, and a treatment variance. The treatment variance is based on the deviations of treatment means from the grand mean, the result being multiplied by the number of observations in each treatment to account for the difference between the variance of observations and the variance of means. The fundamental technique is a partitioning of the total sum of squares SS into components related to the effects used in the model. For example, the model for a simplified ANOVA with one type of treatment at different levels.

    S
    
      S
      
        Total
      
    
    =
    S
    
      S
      
        Error
      
    
    +
    S
    
      S
      
        Treatments
      
    
  

{\displaystyle SS_{\text{Total}}=SS_{\text{Error}}+SS_{\text{Treatments}}}

The number of degrees of freedom DF can be partitioned in a similar way: one of these components (that for error) specifies a chi-squared distribution which describes the associated sum of squares, while the same is true for "treatments" if there is no treatment effect.

    D
    
      F
      
        Total
      
    
    =
    D
    
      F
      
        Error
      
    
    +
    D
    
      F
      
        Treatments
      
    
  

{\displaystyle DF_{\text{Total}}=DF_{\text{Error}}+DF_{\text{Treatments}}}

=== The F-test ===

The F-test is used for comparing the factors of the total deviation. For example, in one-way, or single-factor ANOVA, statistical significance is tested for by comparing the F test statistic

    F
    =
    
      
        variance between treatments
        variance within treatments
      
    
  

{\displaystyle F={\frac {\text{variance between treatments}}{\text{variance within treatments}}}}




  
    F
    =
    
      
        
          M
          
            S
            
              Treatments
            
          
        
        
          M
          
            S
            
              Error
            
          
        
      
    
    =
    
      
        
          S
          
            S
            
              Treatments
            
          
          
            /
          
          (
          I
          −
          1
          )
        
        
          S
          
            S
            
              Error
            
          
          
            /
          
          (
          
            n
            
              T
            
          
          −
          I
          )
        
      
    
  

{\displaystyle F={\frac {MS_{\text{Treatments}}}{MS_{\text{Error}}}}={{SS_{\text{Treatments}}/(I-1)} \over {SS_{\text{Error}}/(n_{T}-I)}}}

where MS is mean square,

    I
  

{\displaystyle I}

is the number of treatments and

      n
      
        T
      
    
  

{\displaystyle n_{T}}

is the total number of cases to the F-distribution with

    I
    −
    1
  

{\displaystyle I-1}

being the numerator degrees of freedom and

      n
      
        T
      
    
    −
    I
  

{\displaystyle n_{T}-I}

the denominator degrees of freedom. Using the F-distribution is a natural candidate because the test statistic is the ratio of two scaled sums of squares each of which follows a scaled chi-squared distribution. The expected value of F is

    1
    +
    
      n
      
        σ
        
          Treatment
        
        
          2
        
      
    
    
      /
    
    
      
        σ
        
          Error
        
        
          2
        
      
    
  

{\displaystyle 1+{n\sigma _{\text{Treatment}}^{2}}/{\sigma _{\text{Error}}^{2}}}

(where

    n
  

{\displaystyle n}

is the treatment sample size) which is 1 for no treatment effect. As values of F increase above 1, the evidence is increasingly inconsistent with the null hypothesis. Two apparent experimental methods of increasing F are increasing the sample size and reducing the error variance by tight experimental controls. There are two methods of concluding the ANOVA hypothesis test, both of which produce the same result:

9.2 KiB Raw Blame History Unescape Escape

9.2 KiB

Raw Blame History