kb/data/en.wikipedia.org/wiki/Difference_in_differences-1.md

---
title: "Difference in differences"
chunk: 2/4
source: "https://en.wikipedia.org/wiki/Difference_in_differences"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T09:50:03.034225+00:00"
instance: "kb-cron"
---


                δ
                (

                  D

                    11


                −

                  D

                    12


                )
                +
                δ
                (

                  D

                    22


                −

                  D

                    21


                )
                +


                      ε
                      ¯


                    11


                −


                      ε
                      ¯


                    12


                +


                      ε
                      ¯


                    22


                −


                      ε
                      ¯


                    21


                .


    {\displaystyle {\begin{aligned}&({\overline {y}}_{11}-{\overline {y}}_{12})-({\overline {y}}_{21}-{\overline {y}}_{22})\\[6pt]={}&{\big [}(\gamma _{1}+\lambda _{1}+\delta D_{11}+{\overline {\varepsilon }}_{11})-(\gamma _{1}+\lambda _{2}+\delta D_{12}+{\overline {\varepsilon }}_{12}){\big ]}\\&\qquad {}-{\big [}(\gamma _{2}+\lambda _{1}+\delta D_{21}+{\overline {\varepsilon }}_{21})-(\gamma _{2}+\lambda _{2}+\delta D_{22}+{\overline {\varepsilon }}_{22}){\big ]}\\[6pt]={}&\delta (D_{11}-D_{12})+\delta (D_{22}-D_{21})+{\overline {\varepsilon }}_{11}-{\overline {\varepsilon }}_{12}+{\overline {\varepsilon }}_{22}-{\overline {\varepsilon }}_{21}.\end{aligned}}}


The strict exogeneity assumption then implies that


        E
        ⁡

          [

            (


                  y
                  ¯


                11


            −


                  y
                  ¯


                12


            )
            −
            (


                  y
                  ¯


                21


            −


                  y
                  ¯


                22


            )

          ]


        =

        δ
        (

          D

            11


        −

          D

            12


        )
        +
        δ
        (

          D

            22


        −

          D

            21


        )
        .


    {\displaystyle \operatorname {E} \left[({\overline {y}}_{11}-{\overline {y}}_{12})-({\overline {y}}_{21}-{\overline {y}}_{22})\right]~=~\delta (D_{11}-D_{12})+\delta (D_{22}-D_{21}).}


Without loss of generality, assume that


        s
        =
        2


    {\displaystyle s=2}

 is the treatment group, and


        t
        =
        2


    {\displaystyle t=2}

 is the after period, then


          D

            22


        =
        1


    {\displaystyle D_{22}=1}

 and


          D

            11


        =

          D

            12


        =

          D

            21


        =
        0


    {\displaystyle D_{11}=D_{12}=D_{21}=0}

, giving the DID estimator


              δ
              ^


        =

        (


              y
              ¯


            11


        −


              y
              ¯


            12


        )
        −
        (


              y
              ¯


            21


        −


              y
              ¯


            22


        )
        ,


    {\displaystyle {\hat {\delta }}~=~({\overline {y}}_{11}-{\overline {y}}_{12})-({\overline {y}}_{21}-{\overline {y}}_{22}),}


which can be interpreted as the treatment effect of the treatment indicated by


          D

            s
            t


    {\displaystyle D_{st}}

. Below it is shown how this estimator can be read as a coefficient in an ordinary least squares regression. The model described in this section is over-parametrized; to remedy that, one of the coefficients for the dummy variables can be set to 0, for example, we may set


          γ

            1


        =
        0


    {\displaystyle \gamma _{1}=0}

.

== Assumptions ==

All the Gauss–Markov assumptions of the OLS model apply equally to DID, since DID is a special version of OLS. In addition, DID requires a parallel trend assumption. The parallel trend assumption says that


          λ

            2


        −

          λ

            1


    {\displaystyle \lambda _{2}-\lambda _{1}}

 are the same in both


        s
        =
        1


    {\displaystyle s=1}

 and


        s
        =
        2


    {\displaystyle s=2}

. Given that the formal definition above accurately represents reality, this assumption automatically holds. However, a model with


          λ

            s
            t


        :

          λ

            22


        −

          λ

            21


        ≠

          λ

            12


        −

          λ

            11


    {\displaystyle \lambda _{st}:\lambda _{22}-\lambda _{21}\neq \lambda _{12}-\lambda _{11}}

 may well be more realistic. In order to increase the likelihood of the parallel trend assumption holding, a difference-in-differences approach is often combined with matching. This involves "matching" known "treatment" units with simulated counterfactual "control" units: characteristically equivalent units which did not receive treatment. By defining the outcome variable as a temporal difference (change in observed outcome between pre- and posttreatment periods), and matching multiple units in a large sample on the basis of similar pre-treatment histories, the resulting ATE (i.e. the ATT: average treatment effect for the treated) provides a robust difference-in-differences estimate of treatment effects. This serves two statistical purposes: firstly, conditional on pre-treatment covariates, the parallel trends assumption is likely to hold; and secondly, this approach reduces dependence on associated ignorability assumptions necessary for valid inference.
As illustrated in the figure, the treatment effect is the difference between the observed value of y and what the value of y would have been with parallel trends, had there been no treatment. However, a shortcoming of DID is when something other than the treatment changes in one group but not the other at the same time as the treatment, implying a violation of the parallel trend assumption.
To guarantee the accuracy of the DID estimate, the composition of individuals of the two groups is assumed to remain unchanged over time. When using a DID model, various issues that may compromise the results, such as autocorrelation and Ashenfelter dips, must be considered and dealt with.

== Implementation ==
The DID method can be implemented according to the table below, where the lower right cell is the DID estimator.

Running a regression analysis gives the same result. Consider the OLS model


        y

        =


          β

            0


        +

          β

            1


        T
        +

          β

            2


        S
        +

          β

            3


        (
        T
        ⋅
        S
        )
        +
        ε


    {\displaystyle y~=~\beta _{0}+\beta _{1}T+\beta _{2}S+\beta _{3}(T\cdot S)+\varepsilon }


where


        T


    {\displaystyle T}

 is a dummy variable for the period, equal to


        1


    {\displaystyle 1}

 when


        t
        =
        2


    {\displaystyle t=2}

, and


        S


    {\displaystyle S}

 is a dummy variable for group membership, equal to


        1


    {\displaystyle 1}

 when


        s
        =
        2


    {\displaystyle s=2}

. The composite variable


        (
        T
        ⋅
        S
        )


    {\displaystyle (T\cdot S)}

 is a dummy variable indicating when


        S
        =
        T
        =
        1


    {\displaystyle S=T=1}

. Although it is not shown rigorously here, this is a proper parametrization of the model formal definition, furthermore, it turns out that the group and period averages in that section relate to the model parameter estimates as follows


                        β
                        ^


                    0


                =


                      E
                      ^


                (
                y
                ∣
                T
                =
                0
                ,

                S
                =
                0
                )


                        β
                        ^


                    1


                =


                      E
                      ^


                (
                y
                ∣
                T
                =
                1
                ,

                S
                =
                0
                )
                −


                      E
                      ^


                (
                y
                ∣
                T
                =
                0
                ,

                S
                =
                0
                )


                        β
                        ^


                    2


                =


                      E
                      ^


                (
                y
                ∣
                T
                =
                0
                ,

                S
                =
                1
                )
                −


                      E
                      ^


                (
                y
                ∣
                T
                =
                0
                ,

                S
                =
                0
                )


                        β
                        ^


                    3


                =


                    [


                      E
                      ^


                (
                y
                ∣
                T
                =
                1
                ,

                S
                =
                1
                )
                −


                      E
                      ^


                (
                y
                ∣
                T
                =
                0
                ,

                S
                =
                1
                )


                    ]


                −


                    [


                      E
                      ^


                (
                y
                ∣
                T
                =
                1
                ,

                S
                =
                0
                )
                −


                      E
                      ^


                (
                y
                ∣
                T
                =
                0
                ,

                S
                =
                0
                )


                    ]


                ,


    {\displaystyle {\begin{aligned}{\hat {\beta }}_{0}&={\widehat {E}}(y\mid T=0,~S=0)\\[8pt]{\hat {\beta }}_{1}&={\widehat {E}}(y\mid T=1,~S=0)-{\widehat {E}}(y\mid T=0,~S=0)\\[8pt]{\hat {\beta }}_{2}&={\widehat {E}}(y\mid T=0,~S=1)-{\widehat {E}}(y\mid T=0,~S=0)\\[8pt]{\hat {\beta }}_{3}&={\big [}{\widehat {E}}(y\mid T=1,~S=1)-{\widehat {E}}(y\mid T=0,~S=1){\big ]}\\&\qquad {}-{\big [}{\widehat {E}}(y\mid T=1,~S=0)-{\widehat {E}}(y\mid T=0,~S=0){\big ]},\end{aligned}}}