kb/data/en.wikipedia.org/wiki/Coefficient_of_determination-5.md

---
title: "Coefficient of determination"
chunk: 6/6
source: "https://en.wikipedia.org/wiki/Coefficient_of_determination"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T07:23:31.318214+00:00"
instance: "kb-cron"
---

It is assumed that the matrix X is standardized with Z-scores and that the column vector


        y


    {\displaystyle y}

 is centered to have a mean of zero. Let the column vector


          β

            0


    {\displaystyle \beta _{0}}

 refer to the hypothesized regression parameters and let the column vector


        b


    {\displaystyle b}

 denote the estimated parameters. We can then define


          R

            2


        =
        1
        −


              (
              y
              −
              X
              b

                )
                ′

              (
              y
              −
              X
              b
              )


              (
              y
              −
              X

                β

                  0


                )
                ′

              (
              y
              −
              X

                β

                  0


              )


        .


    {\displaystyle R^{2}=1-{\frac {(y-Xb)'(y-Xb)}{(y-X\beta _{0})'(y-X\beta _{0})}}.}


An R2 of 75% means that the in-sample accuracy improves by 75% if the data-optimized b solutions are used instead of the hypothesized


          β

            0


    {\displaystyle \beta _{0}}

 values. In the special case that


          β

            0


    {\displaystyle \beta _{0}}

 is a vector of zeros, we obtain the traditional R2 again.
The individual effect on R2 of deviating from a hypothesis can be computed with


          R

            ⊗


    {\displaystyle R^{\otimes }}

 ('R-outer'). This


        p


    {\displaystyle p}

 times


        p


    {\displaystyle p}

 matrix is given by


          R

            ⊗


        =
        (

          X
          ′


                y
                ~


            0


        )
        (

          X
          ′


                y
                ~


            0


          )
          ′

        (

          X
          ′

        X

          )

            −
            1


        (


                y
                ~


            0

          ′


                y
                ~


            0


          )

            −
            1


        ,


    {\displaystyle R^{\otimes }=(X'{\tilde {y}}_{0})(X'{\tilde {y}}_{0})'(X'X)^{-1}({\tilde {y}}_{0}'{\tilde {y}}_{0})^{-1},}


where


                y
                ~


            0


        =
        y
        −
        X

          β

            0


    {\displaystyle {\tilde {y}}_{0}=y-X\beta _{0}}

. The diagonal elements of


          R

            ⊗


    {\displaystyle R^{\otimes }}

 exactly add up to R2. If regressors are uncorrelated and


          β

            0


    {\displaystyle \beta _{0}}

 is a vector of zeros, then the


          j

            th


    {\displaystyle j^{\text{th}}}

 diagonal element of


          R

            ⊗


    {\displaystyle R^{\otimes }}

 simply corresponds to the r2 value between


          x

            j


    {\displaystyle x_{j}}

 and


        y


    {\displaystyle y}

. When regressors


          x

            i


    {\displaystyle x_{i}}

 and


          x

            j


    {\displaystyle x_{j}}

 are correlated,


          R

            i
            i


            ⊗


    {\displaystyle R_{ii}^{\otimes }}

 might increase at the cost of a decrease in


          R

            j
            j


            ⊗


    {\displaystyle R_{jj}^{\otimes }}

. As a result, the diagonal elements of


          R

            ⊗


    {\displaystyle R^{\otimes }}

 may be smaller than 0 and, in more exceptional cases, larger than 1. To deal with such uncertainties, several shrinkage estimators implicitly take a weighted average of the diagonal elements of


          R

            ⊗


    {\displaystyle R^{\otimes }}

 to quantify the relevance of deviating from a hypothesized value. Click on the lasso for an example.

=== R2 in logistic regression ===
In the case of logistic regression, usually fit by maximum likelihood, there are several choices of pseudo-R2.
One is the generalized R2 originally proposed by Cox & Snell, and independently by Magee:


          R

            2


        =
        1
        −


            (


                      L


                  (
                  0
                  )


                      L


                  (


                        θ
                        ^


                  )


            )


            2

              /

            n


    {\displaystyle R^{2}=1-\left({{\mathcal {L}}(0) \over {\mathcal {L}}({\widehat {\theta }})}\right)^{2/n}}


where


            L


        (
        0
        )


    {\displaystyle {\mathcal {L}}(0)}

 is the likelihood of the model with only the intercept,


              L


          (


                θ
                ^


          )


    {\displaystyle {{\mathcal {L}}({\widehat {\theta }})}}

 is the likelihood of the estimated model (i.e., the model with a given set of parameter estimates) and n is the sample size. It is easily rewritten to:


          R

            2


        =
        1
        −

          e


                2
                n


            (
            ln
            ⁡
            (


                L


            (
            0
            )
            )
            −
            ln
            ⁡
            (


                L


            (


                  θ
                  ^


            )
            )
            )


        =
        1
        −

          e

            −
            D

              /

            n


    {\displaystyle R^{2}=1-e^{{\frac {2}{n}}(\ln({\mathcal {L}}(0))-\ln({\mathcal {L}}({\widehat {\theta }})))}=1-e^{-D/n}}


where D is the test statistic of the likelihood ratio test.
Nico Nagelkerke noted that it had the following properties:

It is consistent with the classical coefficient of determination when both can be computed;
Its value is maximised by the maximum likelihood estimation of a model;
It is asymptotically independent of the sample size;
The interpretation is the proportion of the variation explained by the model;
The values are between 0 and 1, with 0 denoting that model does not explain any variation and 1 denoting that it perfectly explains the observed variation;
It does not have any unit.
However, in the case of a logistic model, where


            L


        (


              θ
              ^


        )


    {\displaystyle {\mathcal {L}}({\widehat {\theta }})}

 cannot be greater than 1, R2 is between 0 and


          R

            max


            2


        =
        1
        −
        (


            L


        (
        0
        )

          )

            2

              /

            n


    {\displaystyle R_{\max }^{2}=1-({\mathcal {L}}(0))^{2/n}}

: thus, Nagelkerke suggested the possibility to define a scaled R2 as R2/R2max.

== Comparison with residual statistics ==
Occasionally, residual statistics are used for indicating goodness of fit. The norm of residuals is calculated as the square-root of the sum of squares of residuals (SSR):


          norm of residuals

        =


            S

              S

                res


        =
        ‖
        e
        ‖
        .


    {\displaystyle {\text{norm of residuals}}={\sqrt {SS_{\text{res}}}}=\|e\|.}


Similarly, the reduced chi-square is calculated as the SSR divided by the degrees of freedom.
Both R2 and the norm of residuals have their relative merits. For least squares analysis R2 varies between 0 and 1, with larger numbers indicating better fits and 1 representing a perfect fit. The norm of residuals varies from 0 to infinity with smaller numbers indicating better fits and zero indicating a perfect fit. One advantage and disadvantage of R2 is the


        S

          S

            tot


    {\displaystyle SS_{\text{tot}}}

 term acts to normalize the value. If the yi values are all multiplied by a constant, the norm of residuals will also change by that constant but R2 will stay the same. As a basic example, for the linear least squares fit to the set of data:

R2=0.998, and norm of residuals=0.302.
If all values of y are multiplied by 1000 (for example, in an SI prefix change), then R2 remains the same, but norm of residuals=302.
Another single-parameter indicator of fit is the RMSE of the residuals, or standard deviation of the residuals. This would have a value of 0.135 for the above example given that the fit was linear with an unforced intercept.

== History ==
The creation of the coefficient of determination has been attributed to the geneticist Sewall Wright and was first published in 1921.

== See also ==
Anscombe's quartet
Fraction of variance unexplained
Goodness of fit
Nash–Sutcliffe model efficiency coefficient (hydrological applications)
Pearson product-moment correlation coefficient
Proportional reduction in loss
Regression model validation
Root mean square deviation
Stepwise regression

== Notes ==

== Further reading ==
Gujarati, Damodar N.; Porter, Dawn C. (2009). Basic Econometrics (Fifth ed.). New York: McGraw-Hill/Irwin. pp. 73–78. ISBN 978-0-07-337577-9.
Hughes, Ann; Grawoig, Dennis (1971). Statistics: A Foundation for Analysis. Reading: Addison-Wesley. pp. 344–348. ISBN 0-201-03021-7.
Kmenta, Jan (1986). Elements of Econometrics (Second ed.). New York: Macmillan. pp. 240–243. ISBN 978-0-02-365070-3.
Lewis-Beck, Michael S.; Skalaban, Andrew (1990). "The R-Squared: Some Straight Talk". Political Analysis. 2: 153–171. doi:10.1093/pan/2.1.153. JSTOR 23317769.