kb/Analysis_of_variance-1.md at fce93de38ce652fb78056647618bc02dd6c80dae

turtle89431 292594baa5 Scrape wikipedia-science: 5966 new, 3181 updated, 9417 total (kb-cron)

2026-05-05 02:49:05 -07:00

7.7 KiB

Raw Blame History

title	chunk	source	category	tags	date_saved	instance
Analysis of variance	2/7	https://en.wikipedia.org/wiki/Analysis_of_variance	reference	science, encyclopedia	2026-05-05T09:48:53.349210+00:00	kb-cron

== Classes of models == There are three classes of models used in the analysis of variance, and these are outlined here.

=== Fixed-effects models ===

The fixed-effects model (class I) of analysis of variance applies to situations in which the experimenter applies one or more treatments to the subjects of the experiment to see whether the response variable values change. This allows the experimenter to estimate the ranges of response variable values that the treatment would generate in the population as a whole.

=== Random-effects models ===

Random-effects model (class II) is used when the treatments are not fixed. This occurs when the various factor levels are sampled from a larger population. Because the levels themselves are random variables, some assumptions and the method of contrasting the treatments (a multi-variable generalization of simple differences) differ from the fixed-effects model.

=== Mixed-effects models ===

A mixed-effects model (class III) contains experimental factors of both fixed and random-effects types, with appropriately different interpretations and analysis for the two types.

=== Example === Teaching experiments could be performed by a college or university department to find a good introductory textbook, with each text considered a treatment. The fixed-effects model would compare a list of candidate texts. The random-effects model would determine whether important differences exist among a list of randomly selected texts. The mixed-effects model would compare the (fixed) incumbent texts to randomly selected alternatives. Defining fixed and random effects has proven elusive, with multiple competing definitions.

== Assumptions == The analysis of variance has been studied from several approaches, the most common of which uses a linear model that relates the response to the treatments and blocks. Note that the model is linear in parameters but may be nonlinear across factor levels. Interpretation is easy when data is balanced across factors but much deeper understanding is needed for unbalanced data.

=== Textbook analysis using a normal distribution === The analysis of variance can be presented in terms of a linear model, which makes the following assumptions about the probability distribution of the responses:

Independence of observations – this is an assumption of the model that simplifies the statistical analysis. Normality – the distributions of the residuals are normal. Equality (or "homogeneity") of variances, called homoscedasticity—the variance of data in groups should be the same. The separate assumptions of the textbook model imply that the errors are independently, identically, and normally distributed for fixed effects models, that is, that the errors (

    ε
  

{\displaystyle \varepsilon }

) are independent and

    ε
    ∼
    N
    (
    0
    ,
    
      σ
      
        2
      
    
    )
    .
  

{\displaystyle \varepsilon \thicksim N(0,\sigma ^{2}).}

=== Randomization-based analysis ===

In a randomized controlled experiment, the treatments are randomly assigned to experimental units, following the experimental protocol. This randomization is objective and declared before the experiment is carried out. The objective random-assignment is used to test the significance of the null hypothesis, following the ideas of C. S. Peirce and Ronald Fisher. This design-based analysis was discussed and developed by Francis J. Anscombe at Rothamsted Experimental Station and by Oscar Kempthorne at Iowa State University. Kempthorne and his students make an assumption of unit treatment additivity, which is discussed in the books of Kempthorne and David R. Cox.

==== Unit-treatment additivity ==== In its simplest form, the assumption of unit-treatment additivity states that the observed response

      y
      
        i
        ,
        j
      
    
  

{\displaystyle y_{i,j}}

from experimental unit

    i
  

{\displaystyle i}

when receiving treatment

    j
  

{\displaystyle j}

can be written as the sum of the unit's response

      y
      
        i
      
    
  

{\displaystyle y_{i}}

and the treatment-effect

      t
      
        j
      
    
  

{\displaystyle t_{j}}

, that is

      y
      
        i
        ,
        j
      
    
    =
    
      y
      
        i
      
    
    +
    
      t
      
        j
      
    
    .
  

{\displaystyle y_{i,j}=y_{i}+t_{j}.}

The assumption of unit-treatment additivity implies that, for every treatment

    j
  

{\displaystyle j}

, the

    j
  

{\displaystyle j}

th treatment has exactly the same effect

      t
      
        j
      
    
  

{\displaystyle t_{j}}

on every experiment unit. The assumption of unit treatment additivity usually cannot be directly falsified, according to Cox and Kempthorne. However, many consequences of treatment-unit additivity can be falsified. For a randomized experiment, the assumption of unit-treatment additivity implies that the variance is constant for all treatments. Therefore, by contraposition, a necessary condition for unit-treatment additivity is that the variance is constant. The use of unit treatment additivity and randomization is similar to the design-based inference that is standard in finite-population survey sampling.

==== Derived linear model ==== Kempthorne uses the randomization-distribution and the assumption of unit treatment additivity to produce a derived linear model, very similar to the textbook model discussed previously. The test statistics of this derived linear model are closely approximated by the test statistics of an appropriate normal linear model, according to approximation theorems and simulation studies. However, there are differences. For example, the randomization-based analysis results in a small but (strictly) negative correlation between the observations. In the randomization-based analysis, there is no assumption of a normal distribution and certainly no assumption of independence. On the contrary, the observations are dependent! The randomization-based analysis has the disadvantage that its exposition involves tedious algebra and extensive time. Since the randomization-based analysis is complicated and is closely approximated by the approach using a normal linear model, most teachers emphasize the normal linear model approach. Few statisticians object to model-based analysis of balanced randomized experiments.

==== Statistical models for observational data ==== However, when applied to data from non-randomized experiments or observational studies, model-based analysis lacks the warrant of randomization. For observational data, the derivation of confidence intervals must use subjective models, as emphasized by Ronald Fisher and his followers. In practice, the estimates of treatment-effects from observational studies generally are often inconsistent. In practice, "statistical models" and observational data are useful for suggesting hypotheses that should be treated very cautiously by the public.

=== Summary of assumptions ===

7.7 KiB Raw Blame History Unescape Escape

7.7 KiB

Raw Blame History