kb/data/en.wikipedia.org/wiki/Confounding-1.md

6.7 KiB
Raw Blame History

title chunk source category tags date_saved instance
Confounding 2/5 https://en.wikipedia.org/wiki/Confounding reference science, encyclopedia 2026-05-05T09:49:43.772485+00:00 kb-cron

because the observational quantity contains information about the correlation between X and Z, and the interventional quantity does not (since X is not correlated with Z in a randomized experiment). It can be shown that, in cases where only observational data is available, an unbiased estimate of the desired quantity

    P
    (
    y
    
    
      do
    
    (
    x
    )
    )
  

{\displaystyle P(y\mid {\text{do}}(x))}

, can be obtained by "adjusting" for all confounding factors, namely, conditioning on their various values and averaging the result. In the case of a single confounder Z, this leads to the "adjustment formula":

which gives an unbiased estimate for the causal effect of X on Y. The same adjustment formula works when there are multiple confounders except, in this case, the choice of a set Z of variables that would guarantee unbiased estimates must be done with caution. The criterion for a proper choice of variables is called the Back-Door and requires that the chosen set Z "blocks" (or intercepts) every path between X and Y that contains an arrow into X. Such sets are called "Back-Door admissible" and may include variables which are not common causes of X and Y, but merely proxies thereof. Returning to the drug use example, since Z complies with the Back-Door requirement (i.e., it intercepts the one Back-Door path

    X
    ←
    Z
    →
    Y
  

{\displaystyle X\leftarrow Z\rightarrow Y}

), the Back-Door adjustment formula is valid:

In this way the physician can predict the likely effect of administering the drug from observational studies in which the conditional probabilities appearing on the right-hand side of the equation can be estimated by regression. Contrary to common beliefs, adding covariates to the adjustment set Z can introduce bias. A typical counterexample occurs when Z is a common effect of X and Y, a case in which Z is not a confounder (i.e., the null set is Back-door admissible) and adjusting for Z would create bias known as "collider bias" or "Berkson's paradox." Controls that are not good confounders are sometimes called bad controls. In general, confounding can be controlled by adjustment if and only if there is a set of observed covariates that satisfies the Back-Door condition. Moreover, if Z is such a set, then the adjustment formula of Eq. (3) is valid. Pearl's do-calculus provides all possible conditions under which

    P
    (
    y
    
    
      do
    
    (
    x
    )
    )
  

{\displaystyle P(y\mid {\text{do}}(x))}

can be estimated, not necessarily by adjustment.

== History == According to Morabia (2011), the word confounding derives from the Medieval Latin verb "confundere", which meant "mixing", and was probably chosen to represent the confusion (from Latin: con=with + fusus=mix or fuse together) between the cause one wishes to assess and other causes that may affect the outcome and thus confuse, or stand in the way of the desired assessment. Greenland, Robins and Pearl note an early use of the term "confounding" in causal inference by John Stuart Mill in 1843. Fisher introduced the word "confounding" in his 1935 book "The Design of Experiments" to refer specifically to a consequence of blocking (i.e., partitioning) the set of treatment combinations in a factorial experiment, whereby certain interactions may be "confounded with blocks". This popularized the notion of confounding in statistics, although Fisher was concerned with the control of heterogeneity in experimental units, not with causal inference. According to Vandenbroucke (2004) it was Kish who used the word "confounding" in the sense of "incomparability" of two or more groups (e.g., exposed and unexposed) in an observational study. Formal conditions defining what makes certain groups "comparable" and others "incomparable" were later developed in epidemiology by Greenland and Robins (1986) using the counterfactual language of Neyman (1935) and Rubin (1974). These were later supplemented by graphical criteria such as the Back-Door condition (Pearl 1993; Greenland, Robins and Pearl 1999). Graphical criteria were shown to be formally equivalent to the counterfactual definition but more transparent to researchers relying on process models.

== Types == In the case of risk assessments evaluating the magnitude and nature of risk to human health, it is important to control for confounding to isolate the effect of a particular hazard such as a food additive, pesticide, or new drug. For prospective studies, it is difficult to recruit and screen for volunteers with the same background (age, diet, education, geography, etc.), and in historical studies, there can be similar variability. Due to the inability to control for variability of volunteers and human studies, confounding is a particular challenge. For these reasons, experiments offer a way to avoid most forms of confounding. In some disciplines, confounding is categorized into different types. In epidemiology, one type is "confounding by indication", which relates to confounding from observational studies. Because prognostic factors may influence treatment decisions (and bias estimates of treatment effects), controlling for known prognostic factors may reduce this problem, but it is always possible that a forgotten or unknown factor was not included or that factors interact complexly. Confounding by indication has been described as the most important limitation of observational studies. Randomized trials are not affected by confounding by indication due to random assignment. Confounding variables may also be categorised according to their source. The choice of measurement instrument (operational confound), situational characteristics (procedural confound), or inter-individual differences (person confound).

An operational confounding can occur in both experimental and non-experimental research designs. This type of confounding occurs when a measure designed to assess a particular construct inadvertently measures something else as well. A procedural confounding can occur in a laboratory experiment or a quasi-experiment. This type of confound occurs when the researcher mistakenly allows another variable to change along with the manipulated independent variable. A person confounding occurs when two or more groups of units are analyzed together (e.g., workers from different occupations), despite varying according to one or more other (observed or unobserved) characteristics (e.g., gender).