6.2 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Confounding | 1/5 | https://en.wikipedia.org/wiki/Confounding | reference | science, encyclopedia | 2026-05-05T09:49:43.772485+00:00 | kb-cron |
In causal inference, confounding is a form of systematic error (or bias) that can distort estimates of causal effects in observational studies. A confounder is traditionally understood to be a variable that (1) independently predicts the outcome (or dependent variable), (2) is associated with the exposure (or independent variable), and (3) is not on the causal pathway between the exposure and the outcome. Failure to control for a confounder results in a spurious association between exposure and outcome. Confounding is a causal concept rather than a purely statistical one, and therefore cannot be fully described by correlations or associations alone. The presence of confounders helps explain why correlation does not imply causation, and why careful study design and analytical methods (such as randomization, statistical adjustment, or causal diagrams) are required to distinguish causal effects from spurious associations. Several notation systems and formal frameworks, such as causal directed acyclic graphs (DAGs), have been developed to represent and detect confounding, making it possible to identify when a variable must be controlled for in order to obtain an unbiased estimate of a causal effect. Confounders are threats to internal validity.
== Definition == Confounding is defined in terms of the data generating model. Let X be an exposure (or independent variable), and let Y be the outcome (or dependent variable). Traditionally, a variable Z was considered to confound the relationship between X and Y if Z (1) independently predicts Y, (2) is associated with X, and (3) is not on the causal pathway between X and Y. Not controlling for Z introduces a spurious relationship between X and Y. However, several developments in causal inference over the past decades have shown that this definition of confounding is inadequate. This is because there can be pre-exposure variables associated with the outcome that, when controlled for, introduce rather than eliminate bias. Modern causal inference therefore typically defines a confounder in terms of the minimally sufficient adjustment set. Formally, a set of variables Z is a sufficient adjustment set for the effect of X on Y if, conditional on Z, the potential outcomes are independent of X. I.e. after adjusting for Z, the exposed and unexposed groups are exchangeable with respect to the outcome. A minimally sufficient adjustment set is an adjustment set Z where every member of Z is required to control for confounding. Under this framework, a confounder is defined as a member of the minimally sufficient adjustment set. In the language of directed acyclic graphs, confounding corresponds to the presence of one or more open backdoor paths between X and Y. A set of variables Z is a sufficient adjustment set if conditioning on Z blocks all backdoor paths from X to Y. The set is minimally sufficient if no proper subset of Z satisfies this property. Removing any variable from a minimally sufficient set reopens at least one backdoor path.
== Examples == Simple Example A trucking company compares the fuel economy of trucks from two manufacturers (“A” and “B”) by measuring miles per gallon over one month. They find that A trucks appear more fuel-efficient. However, A trucks are more often assigned highway routes while B trucks are more often assigned city routes. Here, truck make is the independent variable, MPG is the dependent variable, and route type (or proportion of city driving) is the confounder. Because route type affects MPG and the route type differs across truck make, it confounds the comparison. Thus the observed difference likely reflects highway vs. city driving rather than truck make. Relationship between birth order and Down Syndrome A scientist is studying the relationship between birth order (1st child, 2nd child, etc.) and the presence of Down syndrome in the child. However, it is known that:
Higher maternal age is directly associated with Down Syndrome in the child Higher maternal age is directly associated with Down Syndrome, regardless of birth order (a mother having her 1st vs 3rd child at age 50 confers the same risk) Maternal age is directly associated with birth order (the 2nd child, except in the case of twins, is born when the mother is older than she was for the birth of the 1st child) Maternal age is not a consequence of birth order (having a 2nd child does not change the mother's age) In this scenario, maternal age is a confounding variable, as it influences both the independent variable (birth order), and the dependent variable (Down syndrome). Relationship between smoking and lung disease A scientist is studying the relationship between smoking status (smoker vs. non-smoker) and the presence of lung disease. However, it is known that:
Alcohol consumption and diet are directly associated with lung disease and overall health. Alcohol consumption and diet affect health regardless of smoking status (a smoker and non-smoker with similar alcohol use and diet may have similar health risks). Alcohol consumption and diet are associated with smoking status (smokers are, on average, more likely to drink alcohol or have less healthy diets than non-smokers). Alcohol consumption and diet are not consequences of smoking itself (smoking does not necessarily cause higher alcohol intake or poor diet, even if they are correlated). In this scenario, alcohol consumption or diet is a confounding variable, because it influences both the independent variable (smoking status) and the dependent variable (health outcome). If these factors are not controlled, the observed association between smoking and lung disease may be partly or entirely due to differences in alcohol use or diet rather than smoking itself.
== Control ==
Consider a researcher attempting to assess the effectiveness of drug X, from population data in which drug usage was a patient's choice. The data shows that gender (Z) influences a patient's choice of drug as well as their chances of recovery (Y). In this scenario, gender Z confounds the relation between X and Y since Z is a cause of both X and Y:
We have that