kb/data/en.wikipedia.org/wiki/Difference_in_differences-2.md

---
title: "Difference in differences"
chunk: 3/4
source: "https://en.wikipedia.org/wiki/Difference_in_differences"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T09:50:03.034225+00:00"
instance: "kb-cron"
---

where


              E
              ^


        (
        ⋯
        ∣
        …
        )


    {\displaystyle {\widehat {E}}(\dots \mid \dots )}

 stands for conditional averages computed on the sample, for example,


        T
        =
        1


    {\displaystyle T=1}

 is the indicator for the after period,


        S
        =
        0


    {\displaystyle S=0}

 is an indicator for the control group. Note that


                β
                ^


            1


    {\displaystyle {\hat {\beta }}_{1}}

 is an estimate of the counterfactual rather than the impact of the control group. The control group is often used as a proxy for the counterfactual (see, Synthetic control method for a deeper understanding of this point). Thereby,


                β
                ^


            1


    {\displaystyle {\hat {\beta }}_{1}}

 can be interpreted as the impact of both the control group and the intervention's (treatment's) counterfactual. Similarly,


                β
                ^


            2


    {\displaystyle {\hat {\beta }}_{2}}

, due to the parallel trend assumption, is also the same differential between the treatment and control group in


        T
        =
        1


    {\displaystyle T=1}

. The above descriptions should not be construed to imply the (average) effect of only  the control group, for


                β
                ^


            1


    {\displaystyle {\hat {\beta }}_{1}}

, or only the difference of the treatment and control groups in the pre-period, for


                β
                ^


            2


    {\displaystyle {\hat {\beta }}_{2}}

. As in Card and Krueger, below, a first (time) difference of the outcome variable


        (
        Δ

          Y

            i


        =

          Y

            i
            ,
            1


        −

          Y

            i
            ,
            0


        )


    {\displaystyle (\Delta Y_{i}=Y_{i,1}-Y_{i,0})}

 eliminates the need for time-trend (i.e.,


                β
                ^


            1


    {\displaystyle {\hat {\beta }}_{1}}

) to form an unbiased estimate of


                β
                ^


            3


    {\displaystyle {\hat {\beta }}_{3}}

, implying that


                β
                ^


            1


    {\displaystyle {\hat {\beta }}_{1}}

 is not actually conditional on the treatment or control group. Consistently, a difference among the treatment and control groups would eliminate the need for treatment differentials (i.e.,


                β
                ^


            2


    {\displaystyle {\hat {\beta }}_{2}}

) to form an unbiased estimate of


                β
                ^


            3


    {\displaystyle {\hat {\beta }}_{3}}

. This nuance is important to understand when the user believes (weak) violations of parallel pre-trend exist or in the case of violations of the appropriate counterfactual approximation assumptions given the existence of non-common shocks or confounding events.  To see the relation between this notation and the previous section, consider as above only one observation per time period for each group, then


                      E
                      ^


                (
                y
                ∣
                T
                =
                1
                ,

                S
                =
                0
                )


                =


                      E
                      ^


                (
                y
                ∣

                   after period, control

                )


                =


                            E
                            ^


                      (
                      y

                      I
                      (

                         after period, control

                      )
                      )


                            P
                            ^


                      (

                         after period, control

                      )


                =


                        ∑

                          i
                          =
                          1


                          n


                        y

                          i
                          ,

                            after


                      I
                      (
                      i

                         in control

                      )


                      n

                        control


                =


                      y
                      ¯


                    control, after


                =


                      y
                      ¯


                    12


    {\displaystyle {\begin{aligned}{\widehat {E}}(y\mid T=1,~S=0)&={\widehat {E}}(y\mid {\text{ after period, control}})\\[3pt]\\&={\frac {{\widehat {E}}(y\ I({\text{ after period, control}}))}{{\widehat {P}}({\text{ after period, control}})}}\\[3pt]\\&={\frac {\sum _{i=1}^{n}y_{i,{\text{after}}}I(i{\text{ in control}})}{n_{\text{control}}}}={\overline {y}}_{\text{control, after}}\\[3pt]\\&={\overline {y}}_{\text{12}}\end{aligned}}}


and so on for other values of


        T


    {\displaystyle T}

 and


        S


    {\displaystyle S}

, which is equivalent to


                β
                ^


            3


        =

        (

          y

            11


        −

          y

            21


        )
        −
        (

          y

            12


        −

          y

            22


        )
        .


    {\displaystyle {\hat {\beta }}_{3}~=~(y_{11}-y_{21})-(y_{12}-y_{22}).}


But this is the expression for the treatment effect that was given in the formal definition and in the above table.
Variants of difference-in-difference frameworks include ones for staggered implementation of treatment as well as an estimator introduced for multiple time periods and other variations by Brantly Callaway and Pedro H.C. Sant'Anna.

== Example ==
The Card and Krueger article on minimum wage in New Jersey, published in 1994, is considered one of the most famous DID studies; Card was later awarded the 2021 Nobel Memorial Prize in Economic Sciences in part for this and related work. Card and Krueger compared employment in the fast food sector in New Jersey and in Pennsylvania, in February 1992 and in November 1992, after New Jersey's minimum wage rose from $4.25 to $5.05 in April 1992. Observing a change in employment in New Jersey only, before and after the treatment, would fail to control for omitted variables such as weather and macroeconomic conditions of the region. By including Pennsylvania as a control in a difference-in-differences model, any bias caused by variables common to New Jersey and Pennsylvania is implicitly controlled for, even when these variables are unobserved. Assuming that New Jersey and Pennsylvania have parallel trends over time, Pennsylvania's change in employment can be interpreted as the change New Jersey would have experienced, had they not increased the minimum wage, and vice versa. The evidence suggested that the increased minimum wage did not induce a decrease in employment in New Jersey, contrary to what some economic theory would suggest. The table below shows Card & Krueger's estimates of the treatment effect on employment, measured as FTEs (or full-time equivalents). Card and Krueger estimate that the $0.80 minimum wage increase in New Jersey led to an average 2.75 FTE increase in employment per store.

A software example application of this research is found on the Stata's command -diff-

== Applications ==
The difference-in-differences (DID) framework has been applied widely beyond labor economics and minimum wage studies.
In public health, DID has been used to evaluate the effect of new medical guidelines or vaccination campaigns by comparing
regions before and after policy implementation.
In education, DID methods help measure the impact of reforms such as changes in school funding or class size.
In environmental economics, they are used to assess regulations on pollution, energy consumption, or climate policy.
These applications rely on the key assumption of parallel trends, but when carefully designed, they provide policymakers with
robust causal estimates using observational data.