6.2 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Causality | 4/12 | https://en.wikipedia.org/wiki/Causality | reference | science, encyclopedia | 2026-05-05T03:43:20.686725+00:00 | kb-cron |
Interpreting causation as a deterministic relation means that if A causes B, then A must always be followed by B. In this sense, war does not cause deaths, nor does smoking cause cancer or emphysema. As a result, many turn to a notion of probabilistic causation. Informally, A ("The person is a smoker") probabilistically causes B ("The person has now or will have cancer at some time in the future"), if the information that A occurred increases the likelihood of B's occurrence. Formally, P{B|A}≥ P{B} where P{B|A} is the conditional probability that B will occur given the information that A occurred, and P{B} is the probability that B will occur having no knowledge whether A did or did not occur. This intuitive condition is not adequate as a definition for probabilistic causation because of its being too general and thus not meeting our intuitive notion of cause and effect. For example, if A denotes the event "The person is a smoker," B denotes the event "The person now has or will have cancer at some time in the future" and C denotes the event "The person now has or will have emphysema some time in the future," then the following three relationships hold: P{B|A} ≥ P{B}, P{C|A} ≥ P{C} and P{B|C} ≥ P{B}. The last relationship states that knowing that the person has emphysema increases the likelihood that he will have cancer. The reason for this is that having the information that the person has emphysema increases the likelihood that the person is a smoker, thus indirectly increasing the likelihood that the person will have cancer. However, we would not want to conclude that having emphysema causes cancer. Thus, we need additional conditions such as temporal relationship of A to B and a rational explanation as to the mechanism of action. It is hard to quantify this last requirement and thus different authors prefer somewhat different definitions.
=== Causal calculus === When experimental interventions are infeasible or illegal, the derivation of a cause-and-effect relationship from observational studies must rest on some qualitative theoretical assumptions, for example, that symptoms do not cause diseases, usually expressed in the form of missing arrows in causal graphs such as Bayesian networks or path diagrams. The theory underlying these derivations relies on the distinction between conditional probabilities, as in
P
(
c
a
n
c
e
r
|
s
m
o
k
i
n
g
)
{\displaystyle P(cancer|smoking)}
, and interventional probabilities, as in
P
(
c
a
n
c
e
r
|
d
o
(
s
m
o
k
i
n
g
)
)
{\displaystyle P(cancer|do(smoking))}
. The former reads: "the probability of finding cancer in a person known to smoke, having started, unforced by the experimenter, to do so at an unspecified time in the past", while the latter reads: "the probability of finding cancer in a person forced by the experimenter to smoke at a specified time in the past". The former is a statistical notion that can be estimated by observation with negligible intervention by the experimenter, while the latter is a causal notion which is estimated in an experiment with an important controlled randomized intervention. It is specifically characteristic of quantal phenomena that observations defined by incompatible variables always involve important intervention by the experimenter, as described quantitatively by the observer effect. In classical thermodynamics, processes are initiated by interventions called thermodynamic operations. In other branches of science, for example astronomy, the experimenter can often observe with negligible intervention. The theory of "causal calculus" (also known as do-calculus, Judea Pearl's Causal Calculus, Calculus of Actions) permits one to infer interventional probabilities from conditional probabilities in causal Bayesian networks with unmeasured variables. One very practical result of this theory is the characterization of confounding variables, namely, a sufficient set of variables that, if adjusted for, would yield the correct causal effect between variables of interest. It can be shown that a sufficient set for estimating the causal effect of
X
{\displaystyle X}
on
Y
{\displaystyle Y}
is any set of non-descendants of
X
{\displaystyle X}
that
d
{\displaystyle d}
-separate
X
{\displaystyle X}
from
Y
{\displaystyle Y}
after removing all arrows emanating from
X
{\displaystyle X}
. This criterion, called "backdoor", provides a mathematical definition of "confounding" and helps researchers identify accessible sets of variables worthy of measurement.
=== Structure learning === While derivations in causal calculus rely on the structure of the causal graph, parts of the causal structure can, under certain assumptions, be learned from statistical data. The basic idea goes back to Sewall Wright's 1921 work on path analysis. A "recovery" algorithm was developed by Rebane and Pearl (1987) which rests on Wright's distinction between the three possible types of causal substructures allowed in a directed acyclic graph (DAG):
X
→
Y
→
Z
{\displaystyle X\rightarrow Y\rightarrow Z}
X
←
Y
→
Z
{\displaystyle X\leftarrow Y\rightarrow Z}
X
→
Y
←
Z
{\displaystyle X\rightarrow Y\leftarrow Z}