926 lines
17 KiB
Markdown
926 lines
17 KiB
Markdown
---
|
||
title: "Difference in differences"
|
||
chunk: 2/4
|
||
source: "https://en.wikipedia.org/wiki/Difference_in_differences"
|
||
category: "reference"
|
||
tags: "science, encyclopedia"
|
||
date_saved: "2026-05-05T09:50:03.034225+00:00"
|
||
instance: "kb-cron"
|
||
---
|
||
|
||
|
||
|
||
|
||
δ
|
||
(
|
||
|
||
D
|
||
|
||
11
|
||
|
||
|
||
−
|
||
|
||
D
|
||
|
||
12
|
||
|
||
|
||
)
|
||
+
|
||
δ
|
||
(
|
||
|
||
D
|
||
|
||
22
|
||
|
||
|
||
−
|
||
|
||
D
|
||
|
||
21
|
||
|
||
|
||
)
|
||
+
|
||
|
||
|
||
|
||
ε
|
||
¯
|
||
|
||
|
||
|
||
11
|
||
|
||
|
||
−
|
||
|
||
|
||
|
||
ε
|
||
¯
|
||
|
||
|
||
|
||
12
|
||
|
||
|
||
+
|
||
|
||
|
||
|
||
ε
|
||
¯
|
||
|
||
|
||
|
||
22
|
||
|
||
|
||
−
|
||
|
||
|
||
|
||
ε
|
||
¯
|
||
|
||
|
||
|
||
21
|
||
|
||
|
||
.
|
||
|
||
|
||
|
||
|
||
|
||
|
||
{\displaystyle {\begin{aligned}&({\overline {y}}_{11}-{\overline {y}}_{12})-({\overline {y}}_{21}-{\overline {y}}_{22})\\[6pt]={}&{\big [}(\gamma _{1}+\lambda _{1}+\delta D_{11}+{\overline {\varepsilon }}_{11})-(\gamma _{1}+\lambda _{2}+\delta D_{12}+{\overline {\varepsilon }}_{12}){\big ]}\\&\qquad {}-{\big [}(\gamma _{2}+\lambda _{1}+\delta D_{21}+{\overline {\varepsilon }}_{21})-(\gamma _{2}+\lambda _{2}+\delta D_{22}+{\overline {\varepsilon }}_{22}){\big ]}\\[6pt]={}&\delta (D_{11}-D_{12})+\delta (D_{22}-D_{21})+{\overline {\varepsilon }}_{11}-{\overline {\varepsilon }}_{12}+{\overline {\varepsilon }}_{22}-{\overline {\varepsilon }}_{21}.\end{aligned}}}
|
||
|
||
|
||
The strict exogeneity assumption then implies that
|
||
|
||
|
||
|
||
|
||
E
|
||
|
||
|
||
[
|
||
|
||
(
|
||
|
||
|
||
|
||
y
|
||
¯
|
||
|
||
|
||
|
||
11
|
||
|
||
|
||
−
|
||
|
||
|
||
|
||
y
|
||
¯
|
||
|
||
|
||
|
||
12
|
||
|
||
|
||
)
|
||
−
|
||
(
|
||
|
||
|
||
|
||
y
|
||
¯
|
||
|
||
|
||
|
||
21
|
||
|
||
|
||
−
|
||
|
||
|
||
|
||
y
|
||
¯
|
||
|
||
|
||
|
||
22
|
||
|
||
|
||
)
|
||
|
||
]
|
||
|
||
|
||
=
|
||
|
||
δ
|
||
(
|
||
|
||
D
|
||
|
||
11
|
||
|
||
|
||
−
|
||
|
||
D
|
||
|
||
12
|
||
|
||
|
||
)
|
||
+
|
||
δ
|
||
(
|
||
|
||
D
|
||
|
||
22
|
||
|
||
|
||
−
|
||
|
||
D
|
||
|
||
21
|
||
|
||
|
||
)
|
||
.
|
||
|
||
|
||
{\displaystyle \operatorname {E} \left[({\overline {y}}_{11}-{\overline {y}}_{12})-({\overline {y}}_{21}-{\overline {y}}_{22})\right]~=~\delta (D_{11}-D_{12})+\delta (D_{22}-D_{21}).}
|
||
|
||
|
||
Without loss of generality, assume that
|
||
|
||
|
||
|
||
s
|
||
=
|
||
2
|
||
|
||
|
||
{\displaystyle s=2}
|
||
|
||
is the treatment group, and
|
||
|
||
|
||
|
||
t
|
||
=
|
||
2
|
||
|
||
|
||
{\displaystyle t=2}
|
||
|
||
is the after period, then
|
||
|
||
|
||
|
||
|
||
D
|
||
|
||
22
|
||
|
||
|
||
=
|
||
1
|
||
|
||
|
||
{\displaystyle D_{22}=1}
|
||
|
||
and
|
||
|
||
|
||
|
||
|
||
D
|
||
|
||
11
|
||
|
||
|
||
=
|
||
|
||
D
|
||
|
||
12
|
||
|
||
|
||
=
|
||
|
||
D
|
||
|
||
21
|
||
|
||
|
||
=
|
||
0
|
||
|
||
|
||
{\displaystyle D_{11}=D_{12}=D_{21}=0}
|
||
|
||
, giving the DID estimator
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
δ
|
||
^
|
||
|
||
|
||
|
||
|
||
=
|
||
|
||
(
|
||
|
||
|
||
|
||
y
|
||
¯
|
||
|
||
|
||
|
||
11
|
||
|
||
|
||
−
|
||
|
||
|
||
|
||
y
|
||
¯
|
||
|
||
|
||
|
||
12
|
||
|
||
|
||
)
|
||
−
|
||
(
|
||
|
||
|
||
|
||
y
|
||
¯
|
||
|
||
|
||
|
||
21
|
||
|
||
|
||
−
|
||
|
||
|
||
|
||
y
|
||
¯
|
||
|
||
|
||
|
||
22
|
||
|
||
|
||
)
|
||
,
|
||
|
||
|
||
{\displaystyle {\hat {\delta }}~=~({\overline {y}}_{11}-{\overline {y}}_{12})-({\overline {y}}_{21}-{\overline {y}}_{22}),}
|
||
|
||
|
||
which can be interpreted as the treatment effect of the treatment indicated by
|
||
|
||
|
||
|
||
|
||
D
|
||
|
||
s
|
||
t
|
||
|
||
|
||
|
||
|
||
{\displaystyle D_{st}}
|
||
|
||
. Below it is shown how this estimator can be read as a coefficient in an ordinary least squares regression. The model described in this section is over-parametrized; to remedy that, one of the coefficients for the dummy variables can be set to 0, for example, we may set
|
||
|
||
|
||
|
||
|
||
γ
|
||
|
||
1
|
||
|
||
|
||
=
|
||
0
|
||
|
||
|
||
{\displaystyle \gamma _{1}=0}
|
||
|
||
.
|
||
|
||
== Assumptions ==
|
||
|
||
All the Gauss–Markov assumptions of the OLS model apply equally to DID, since DID is a special version of OLS. In addition, DID requires a parallel trend assumption. The parallel trend assumption says that
|
||
|
||
|
||
|
||
|
||
λ
|
||
|
||
2
|
||
|
||
|
||
−
|
||
|
||
λ
|
||
|
||
1
|
||
|
||
|
||
|
||
|
||
{\displaystyle \lambda _{2}-\lambda _{1}}
|
||
|
||
are the same in both
|
||
|
||
|
||
|
||
s
|
||
=
|
||
1
|
||
|
||
|
||
{\displaystyle s=1}
|
||
|
||
and
|
||
|
||
|
||
|
||
s
|
||
=
|
||
2
|
||
|
||
|
||
{\displaystyle s=2}
|
||
|
||
. Given that the formal definition above accurately represents reality, this assumption automatically holds. However, a model with
|
||
|
||
|
||
|
||
|
||
λ
|
||
|
||
s
|
||
t
|
||
|
||
|
||
:
|
||
|
||
λ
|
||
|
||
22
|
||
|
||
|
||
−
|
||
|
||
λ
|
||
|
||
21
|
||
|
||
|
||
≠
|
||
|
||
λ
|
||
|
||
12
|
||
|
||
|
||
−
|
||
|
||
λ
|
||
|
||
11
|
||
|
||
|
||
|
||
|
||
{\displaystyle \lambda _{st}:\lambda _{22}-\lambda _{21}\neq \lambda _{12}-\lambda _{11}}
|
||
|
||
may well be more realistic. In order to increase the likelihood of the parallel trend assumption holding, a difference-in-differences approach is often combined with matching. This involves "matching" known "treatment" units with simulated counterfactual "control" units: characteristically equivalent units which did not receive treatment. By defining the outcome variable as a temporal difference (change in observed outcome between pre- and posttreatment periods), and matching multiple units in a large sample on the basis of similar pre-treatment histories, the resulting ATE (i.e. the ATT: average treatment effect for the treated) provides a robust difference-in-differences estimate of treatment effects. This serves two statistical purposes: firstly, conditional on pre-treatment covariates, the parallel trends assumption is likely to hold; and secondly, this approach reduces dependence on associated ignorability assumptions necessary for valid inference.
|
||
As illustrated in the figure, the treatment effect is the difference between the observed value of y and what the value of y would have been with parallel trends, had there been no treatment. However, a shortcoming of DID is when something other than the treatment changes in one group but not the other at the same time as the treatment, implying a violation of the parallel trend assumption.
|
||
To guarantee the accuracy of the DID estimate, the composition of individuals of the two groups is assumed to remain unchanged over time. When using a DID model, various issues that may compromise the results, such as autocorrelation and Ashenfelter dips, must be considered and dealt with.
|
||
|
||
== Implementation ==
|
||
The DID method can be implemented according to the table below, where the lower right cell is the DID estimator.
|
||
|
||
Running a regression analysis gives the same result. Consider the OLS model
|
||
|
||
|
||
|
||
|
||
y
|
||
|
||
=
|
||
|
||
|
||
β
|
||
|
||
0
|
||
|
||
|
||
+
|
||
|
||
β
|
||
|
||
1
|
||
|
||
|
||
T
|
||
+
|
||
|
||
β
|
||
|
||
2
|
||
|
||
|
||
S
|
||
+
|
||
|
||
β
|
||
|
||
3
|
||
|
||
|
||
(
|
||
T
|
||
⋅
|
||
S
|
||
)
|
||
+
|
||
ε
|
||
|
||
|
||
{\displaystyle y~=~\beta _{0}+\beta _{1}T+\beta _{2}S+\beta _{3}(T\cdot S)+\varepsilon }
|
||
|
||
|
||
where
|
||
|
||
|
||
|
||
T
|
||
|
||
|
||
{\displaystyle T}
|
||
|
||
is a dummy variable for the period, equal to
|
||
|
||
|
||
|
||
1
|
||
|
||
|
||
{\displaystyle 1}
|
||
|
||
when
|
||
|
||
|
||
|
||
t
|
||
=
|
||
2
|
||
|
||
|
||
{\displaystyle t=2}
|
||
|
||
, and
|
||
|
||
|
||
|
||
S
|
||
|
||
|
||
{\displaystyle S}
|
||
|
||
is a dummy variable for group membership, equal to
|
||
|
||
|
||
|
||
1
|
||
|
||
|
||
{\displaystyle 1}
|
||
|
||
when
|
||
|
||
|
||
|
||
s
|
||
=
|
||
2
|
||
|
||
|
||
{\displaystyle s=2}
|
||
|
||
. The composite variable
|
||
|
||
|
||
|
||
(
|
||
T
|
||
⋅
|
||
S
|
||
)
|
||
|
||
|
||
{\displaystyle (T\cdot S)}
|
||
|
||
is a dummy variable indicating when
|
||
|
||
|
||
|
||
S
|
||
=
|
||
T
|
||
=
|
||
1
|
||
|
||
|
||
{\displaystyle S=T=1}
|
||
|
||
. Although it is not shown rigorously here, this is a proper parametrization of the model formal definition, furthermore, it turns out that the group and period averages in that section relate to the model parameter estimates as follows
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
β
|
||
^
|
||
|
||
|
||
|
||
|
||
0
|
||
|
||
|
||
|
||
|
||
|
||
=
|
||
|
||
|
||
|
||
E
|
||
^
|
||
|
||
|
||
|
||
(
|
||
y
|
||
∣
|
||
T
|
||
=
|
||
0
|
||
,
|
||
|
||
S
|
||
=
|
||
0
|
||
)
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
β
|
||
^
|
||
|
||
|
||
|
||
|
||
1
|
||
|
||
|
||
|
||
|
||
|
||
=
|
||
|
||
|
||
|
||
E
|
||
^
|
||
|
||
|
||
|
||
(
|
||
y
|
||
∣
|
||
T
|
||
=
|
||
1
|
||
,
|
||
|
||
S
|
||
=
|
||
0
|
||
)
|
||
−
|
||
|
||
|
||
|
||
E
|
||
^
|
||
|
||
|
||
|
||
(
|
||
y
|
||
∣
|
||
T
|
||
=
|
||
0
|
||
,
|
||
|
||
S
|
||
=
|
||
0
|
||
)
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
β
|
||
^
|
||
|
||
|
||
|
||
|
||
2
|
||
|
||
|
||
|
||
|
||
|
||
=
|
||
|
||
|
||
|
||
E
|
||
^
|
||
|
||
|
||
|
||
(
|
||
y
|
||
∣
|
||
T
|
||
=
|
||
0
|
||
,
|
||
|
||
S
|
||
=
|
||
1
|
||
)
|
||
−
|
||
|
||
|
||
|
||
E
|
||
^
|
||
|
||
|
||
|
||
(
|
||
y
|
||
∣
|
||
T
|
||
=
|
||
0
|
||
,
|
||
|
||
S
|
||
=
|
||
0
|
||
)
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
β
|
||
^
|
||
|
||
|
||
|
||
|
||
3
|
||
|
||
|
||
|
||
|
||
|
||
=
|
||
|
||
|
||
[
|
||
|
||
|
||
|
||
|
||
|
||
E
|
||
^
|
||
|
||
|
||
|
||
(
|
||
y
|
||
∣
|
||
T
|
||
=
|
||
1
|
||
,
|
||
|
||
S
|
||
=
|
||
1
|
||
)
|
||
−
|
||
|
||
|
||
|
||
E
|
||
^
|
||
|
||
|
||
|
||
(
|
||
y
|
||
∣
|
||
T
|
||
=
|
||
0
|
||
,
|
||
|
||
S
|
||
=
|
||
1
|
||
)
|
||
|
||
|
||
]
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
−
|
||
|
||
|
||
[
|
||
|
||
|
||
|
||
|
||
|
||
E
|
||
^
|
||
|
||
|
||
|
||
(
|
||
y
|
||
∣
|
||
T
|
||
=
|
||
1
|
||
,
|
||
|
||
S
|
||
=
|
||
0
|
||
)
|
||
−
|
||
|
||
|
||
|
||
E
|
||
^
|
||
|
||
|
||
|
||
(
|
||
y
|
||
∣
|
||
T
|
||
=
|
||
0
|
||
,
|
||
|
||
S
|
||
=
|
||
0
|
||
)
|
||
|
||
|
||
]
|
||
|
||
|
||
,
|
||
|
||
|
||
|
||
|
||
|
||
|
||
{\displaystyle {\begin{aligned}{\hat {\beta }}_{0}&={\widehat {E}}(y\mid T=0,~S=0)\\[8pt]{\hat {\beta }}_{1}&={\widehat {E}}(y\mid T=1,~S=0)-{\widehat {E}}(y\mid T=0,~S=0)\\[8pt]{\hat {\beta }}_{2}&={\widehat {E}}(y\mid T=0,~S=1)-{\widehat {E}}(y\mid T=0,~S=0)\\[8pt]{\hat {\beta }}_{3}&={\big [}{\widehat {E}}(y\mid T=1,~S=1)-{\widehat {E}}(y\mid T=0,~S=1){\big ]}\\&\qquad {}-{\big [}{\widehat {E}}(y\mid T=1,~S=0)-{\widehat {E}}(y\mid T=0,~S=0){\big ]},\end{aligned}}}
|
||
|