kb/data/en.wikipedia.org/wiki/Coefficient_of_determination-5.md

14 KiB
Raw Blame History

title chunk source category tags date_saved instance
Coefficient of determination 6/6 https://en.wikipedia.org/wiki/Coefficient_of_determination reference science, encyclopedia 2026-05-05T07:23:31.318214+00:00 kb-cron

It is assumed that the matrix X is standardized with Z-scores and that the column vector

    y
  

{\displaystyle y}

is centered to have a mean of zero. Let the column vector

      β
      
        0
      
    
  

{\displaystyle \beta _{0}}

refer to the hypothesized regression parameters and let the column vector

    b
  

{\displaystyle b}

denote the estimated parameters. We can then define

      R
      
        2
      
    
    =
    1
    
    
      
        
          (
          y
          
          X
          b
          
            )
            
          
          (
          y
          
          X
          b
          )
        
        
          (
          y
          
          X
          
            β
            
              0
            
          
          
            )
            
          
          (
          y
          
          X
          
            β
            
              0
            
          
          )
        
      
    
    .
  

{\displaystyle R^{2}=1-{\frac {(y-Xb)'(y-Xb)}{(y-X\beta _{0})'(y-X\beta _{0})}}.}

An R2 of 75% means that the in-sample accuracy improves by 75% if the data-optimized b solutions are used instead of the hypothesized

      β
      
        0
      
    
  

{\displaystyle \beta _{0}}

values. In the special case that

      β
      
        0
      
    
  

{\displaystyle \beta _{0}}

is a vector of zeros, we obtain the traditional R2 again. The individual effect on R2 of deviating from a hypothesis can be computed with

      R
      
        ⊗
      
    
  

{\displaystyle R^{\otimes }}

('R-outer'). This

    p
  

{\displaystyle p}

times

    p
  

{\displaystyle p}

matrix is given by

      R
      
        ⊗
      
    
    =
    (
    
      X
      
    
    
      
        
          
            y
            ~
          
        
      
      
        0
      
    
    )
    (
    
      X
      
    
    
      
        
          
            y
            ~
          
        
      
      
        0
      
    
    
      )
      
    
    (
    
      X
      
    
    X
    
      )
      
        
        1
      
    
    (
    
      
        
          
            y
            ~
          
        
      
      
        0
      
      
    
    
      
        
          
            y
            ~
          
        
      
      
        0
      
    
    
      )
      
        
        1
      
    
    ,
  

{\displaystyle R^{\otimes }=(X'{\tilde {y}}_{0})(X'{\tilde {y}}_{0})'(X'X)^{-1}({\tilde {y}}_{0}'{\tilde {y}}_{0})^{-1},}

where

            y
            ~
          
        
      
      
        0
      
    
    =
    y
    
    X
    
      β
      
        0
      
    
  

{\displaystyle {\tilde {y}}_{0}=y-X\beta _{0}}

. The diagonal elements of

      R
      
        ⊗
      
    
  

{\displaystyle R^{\otimes }}

exactly add up to R2. If regressors are uncorrelated and

      β
      
        0
      
    
  

{\displaystyle \beta _{0}}

is a vector of zeros, then the

      j
      
        th
      
    
  

{\displaystyle j^{\text{th}}}

diagonal element of

      R
      
        ⊗
      
    
  

{\displaystyle R^{\otimes }}

simply corresponds to the r2 value between

      x
      
        j
      
    
  

{\displaystyle x_{j}}

and

    y
  

{\displaystyle y}

. When regressors

      x
      
        i
      
    
  

{\displaystyle x_{i}}

and

      x
      
        j
      
    
  

{\displaystyle x_{j}}

are correlated,

      R
      
        i
        i
      
      
        ⊗
      
    
  

{\displaystyle R_{ii}^{\otimes }}

might increase at the cost of a decrease in

      R
      
        j
        j
      
      
        ⊗
      
    
  

{\displaystyle R_{jj}^{\otimes }}

. As a result, the diagonal elements of

      R
      
        ⊗
      
    
  

{\displaystyle R^{\otimes }}

may be smaller than 0 and, in more exceptional cases, larger than 1. To deal with such uncertainties, several shrinkage estimators implicitly take a weighted average of the diagonal elements of

      R
      
        ⊗
      
    
  

{\displaystyle R^{\otimes }}

to quantify the relevance of deviating from a hypothesized value. Click on the lasso for an example.

=== R2 in logistic regression === In the case of logistic regression, usually fit by maximum likelihood, there are several choices of pseudo-R2. One is the generalized R2 originally proposed by Cox & Snell, and independently by Magee:

      R
      
        2
      
    
    =
    1
    
    
      
        (
        
          
            
              
                
                  L
                
              
              (
              0
              )
            
            
              
                
                  L
                
              
              (
              
                
                  
                    θ
                    ^
                  
                
              
              )
            
          
        
        )
      
      
        2
        
          /
        
        n
      
    
  

{\displaystyle R^{2}=1-\left({{\mathcal {L}}(0) \over {\mathcal {L}}({\widehat {\theta }})}\right)^{2/n}}

where

        L
      
    
    (
    0
    )
  

{\displaystyle {\mathcal {L}}(0)}

is the likelihood of the model with only the intercept,

          L
        
      
      (
      
        
          
            θ
            ^
          
        
      
      )
    
  

{\displaystyle {{\mathcal {L}}({\widehat {\theta }})}}

is the likelihood of the estimated model (i.e., the model with a given set of parameter estimates) and n is the sample size. It is easily rewritten to:

      R
      
        2
      
    
    =
    1
    
    
      e
      
        
          
            2
            n
          
        
        (
        ln
        
        (
        
          
            L
          
        
        (
        0
        )
        )
        
        ln
        
        (
        
          
            L
          
        
        (
        
          
            
              θ
              ^
            
          
        
        )
        )
        )
      
    
    =
    1
    
    
      e
      
        
        D
        
          /
        
        n
      
    
  

{\displaystyle R^{2}=1-e^{{\frac {2}{n}}(\ln({\mathcal {L}}(0))-\ln({\mathcal {L}}({\widehat {\theta }})))}=1-e^{-D/n}}

where D is the test statistic of the likelihood ratio test. Nico Nagelkerke noted that it had the following properties:

It is consistent with the classical coefficient of determination when both can be computed; Its value is maximised by the maximum likelihood estimation of a model; It is asymptotically independent of the sample size; The interpretation is the proportion of the variation explained by the model; The values are between 0 and 1, with 0 denoting that model does not explain any variation and 1 denoting that it perfectly explains the observed variation; It does not have any unit. However, in the case of a logistic model, where

        L
      
    
    (
    
      
        
          θ
          ^
        
      
    
    )
  

{\displaystyle {\mathcal {L}}({\widehat {\theta }})}

cannot be greater than 1, R2 is between 0 and

      R
      
        max
      
      
        2
      
    
    =
    1
    
    (
    
      
        L
      
    
    (
    0
    )
    
      )
      
        2
        
          /
        
        n
      
    
  

{\displaystyle R_{\max }^{2}=1-({\mathcal {L}}(0))^{2/n}}

: thus, Nagelkerke suggested the possibility to define a scaled R2 as R2/R2max.

== Comparison with residual statistics == Occasionally, residual statistics are used for indicating goodness of fit. The norm of residuals is calculated as the square-root of the sum of squares of residuals (SSR):

      norm of residuals
    
    =
    
      
        S
        
          S
          
            res
          
        
      
    
    =
    ‖
    e
    ‖
    .
  

{\displaystyle {\text{norm of residuals}}={\sqrt {SS_{\text{res}}}}=\|e\|.}

Similarly, the reduced chi-square is calculated as the SSR divided by the degrees of freedom. Both R2 and the norm of residuals have their relative merits. For least squares analysis R2 varies between 0 and 1, with larger numbers indicating better fits and 1 representing a perfect fit. The norm of residuals varies from 0 to infinity with smaller numbers indicating better fits and zero indicating a perfect fit. One advantage and disadvantage of R2 is the

    S
    
      S
      
        tot
      
    
  

{\displaystyle SS_{\text{tot}}}

term acts to normalize the value. If the yi values are all multiplied by a constant, the norm of residuals will also change by that constant but R2 will stay the same. As a basic example, for the linear least squares fit to the set of data:

R2=0.998, and norm of residuals=0.302. If all values of y are multiplied by 1000 (for example, in an SI prefix change), then R2 remains the same, but norm of residuals=302. Another single-parameter indicator of fit is the RMSE of the residuals, or standard deviation of the residuals. This would have a value of 0.135 for the above example given that the fit was linear with an unforced intercept.

== History == The creation of the coefficient of determination has been attributed to the geneticist Sewall Wright and was first published in 1921.

== See also == Anscombe's quartet Fraction of variance unexplained Goodness of fit NashSutcliffe model efficiency coefficient (hydrological applications) Pearson product-moment correlation coefficient Proportional reduction in loss Regression model validation Root mean square deviation Stepwise regression

== Notes ==

== Further reading == Gujarati, Damodar N.; Porter, Dawn C. (2009). Basic Econometrics (Fifth ed.). New York: McGraw-Hill/Irwin. pp. 7378. ISBN 978-0-07-337577-9. Hughes, Ann; Grawoig, Dennis (1971). Statistics: A Foundation for Analysis. Reading: Addison-Wesley. pp. 344348. ISBN 0-201-03021-7. Kmenta, Jan (1986). Elements of Econometrics (Second ed.). New York: Macmillan. pp. 240243. ISBN 978-0-02-365070-3. Lewis-Beck, Michael S.; Skalaban, Andrew (1990). "The R-Squared: Some Straight Talk". Political Analysis. 2: 153171. doi:10.1093/pan/2.1.153. JSTOR 23317769.