---
title: "Coefficient of determination"
chunk: 6/6
source: "https://en.wikipedia.org/wiki/Coefficient_of_determination"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T07:23:31.318214+00:00"
instance: "kb-cron"
---

It is assumed that the matrix X is standardized with Z-scores and that the column vector 
  
    
      
        y
      
    
    {\displaystyle y}
  
 is centered to have a mean of zero. Let the column vector 
  
    
      
        
          β
          
            0
          
        
      
    
    {\displaystyle \beta _{0}}
  
 refer to the hypothesized regression parameters and let the column vector 
  
    
      
        b
      
    
    {\displaystyle b}
  
 denote the estimated parameters. We can then define 

  
    
      
        
          R
          
            2
          
        
        =
        1
        −
        
          
            
              (
              y
              −
              X
              b
              
                )
                ′
              
              (
              y
              −
              X
              b
              )
            
            
              (
              y
              −
              X
              
                β
                
                  0
                
              
              
                )
                ′
              
              (
              y
              −
              X
              
                β
                
                  0
                
              
              )
            
          
        
        .
      
    
    {\displaystyle R^{2}=1-{\frac {(y-Xb)'(y-Xb)}{(y-X\beta _{0})'(y-X\beta _{0})}}.}
  

An R2 of 75% means that the in-sample accuracy improves by 75% if the data-optimized b solutions are used instead of the hypothesized 
  
    
      
        
          β
          
            0
          
        
      
    
    {\displaystyle \beta _{0}}
  
 values. In the special case that 
  
    
      
        
          β
          
            0
          
        
      
    
    {\displaystyle \beta _{0}}
  
 is a vector of zeros, we obtain the traditional R2 again.
The individual effect on R2 of deviating from a hypothesis can be computed with 
  
    
      
        
          R
          
            ⊗
          
        
      
    
    {\displaystyle R^{\otimes }}
  
 ('R-outer'). This 
  
    
      
        p
      
    
    {\displaystyle p}
  
 times 
  
    
      
        p
      
    
    {\displaystyle p}
  
 matrix is given by

  
    
      
        
          R
          
            ⊗
          
        
        =
        (
        
          X
          ′
        
        
          
            
              
                y
                ~
              
            
          
          
            0
          
        
        )
        (
        
          X
          ′
        
        
          
            
              
                y
                ~
              
            
          
          
            0
          
        
        
          )
          ′
        
        (
        
          X
          ′
        
        X
        
          )
          
            −
            1
          
        
        (
        
          
            
              
                y
                ~
              
            
          
          
            0
          
          ′
        
        
          
            
              
                y
                ~
              
            
          
          
            0
          
        
        
          )
          
            −
            1
          
        
        ,
      
    
    {\displaystyle R^{\otimes }=(X'{\tilde {y}}_{0})(X'{\tilde {y}}_{0})'(X'X)^{-1}({\tilde {y}}_{0}'{\tilde {y}}_{0})^{-1},}
  

where 
  
    
      
        
          
            
              
                y
                ~
              
            
          
          
            0
          
        
        =
        y
        −
        X
        
          β
          
            0
          
        
      
    
    {\displaystyle {\tilde {y}}_{0}=y-X\beta _{0}}
  
. The diagonal elements of 
  
    
      
        
          R
          
            ⊗
          
        
      
    
    {\displaystyle R^{\otimes }}
  
 exactly add up to R2. If regressors are uncorrelated and 
  
    
      
        
          β
          
            0
          
        
      
    
    {\displaystyle \beta _{0}}
  
 is a vector of zeros, then the 
  
    
      
        
          j
          
            th
          
        
      
    
    {\displaystyle j^{\text{th}}}
  
 diagonal element of 
  
    
      
        
          R
          
            ⊗
          
        
      
    
    {\displaystyle R^{\otimes }}
  
 simply corresponds to the r2 value between 
  
    
      
        
          x
          
            j
          
        
      
    
    {\displaystyle x_{j}}
  
 and 
  
    
      
        y
      
    
    {\displaystyle y}
  
. When regressors 
  
    
      
        
          x
          
            i
          
        
      
    
    {\displaystyle x_{i}}
  
 and 
  
    
      
        
          x
          
            j
          
        
      
    
    {\displaystyle x_{j}}
  
 are correlated, 
  
    
      
        
          R
          
            i
            i
          
          
            ⊗
          
        
      
    
    {\displaystyle R_{ii}^{\otimes }}
  
 might increase at the cost of a decrease in 
  
    
      
        
          R
          
            j
            j
          
          
            ⊗
          
        
      
    
    {\displaystyle R_{jj}^{\otimes }}
  
. As a result, the diagonal elements of 
  
    
      
        
          R
          
            ⊗
          
        
      
    
    {\displaystyle R^{\otimes }}
  
 may be smaller than 0 and, in more exceptional cases, larger than 1. To deal with such uncertainties, several shrinkage estimators implicitly take a weighted average of the diagonal elements of 
  
    
      
        
          R
          
            ⊗
          
        
      
    
    {\displaystyle R^{\otimes }}
  
 to quantify the relevance of deviating from a hypothesized value. Click on the lasso for an example.

=== R2 in logistic regression ===
In the case of logistic regression, usually fit by maximum likelihood, there are several choices of pseudo-R2.
One is the generalized R2 originally proposed by Cox & Snell, and independently by Magee:

  
    
      
        
          R
          
            2
          
        
        =
        1
        −
        
          
            (
            
              
                
                  
                    
                      L
                    
                  
                  (
                  0
                  )
                
                
                  
                    
                      L
                    
                  
                  (
                  
                    
                      
                        θ
                        ^
                      
                    
                  
                  )
                
              
            
            )
          
          
            2
            
              /
            
            n
          
        
      
    
    {\displaystyle R^{2}=1-\left({{\mathcal {L}}(0) \over {\mathcal {L}}({\widehat {\theta }})}\right)^{2/n}}
  

where 
  
    
      
        
          
            L
          
        
        (
        0
        )
      
    
    {\displaystyle {\mathcal {L}}(0)}
  
 is the likelihood of the model with only the intercept, 
  
    
      
        
          
            
              L
            
          
          (
          
            
              
                θ
                ^
              
            
          
          )
        
      
    
    {\displaystyle {{\mathcal {L}}({\widehat {\theta }})}}
  
 is the likelihood of the estimated model (i.e., the model with a given set of parameter estimates) and n is the sample size. It is easily rewritten to:

  
    
      
        
          R
          
            2
          
        
        =
        1
        −
        
          e
          
            
              
                2
                n
              
            
            (
            ln
            ⁡
            (
            
              
                L
              
            
            (
            0
            )
            )
            −
            ln
            ⁡
            (
            
              
                L
              
            
            (
            
              
                
                  θ
                  ^
                
              
            
            )
            )
            )
          
        
        =
        1
        −
        
          e
          
            −
            D
            
              /
            
            n
          
        
      
    
    {\displaystyle R^{2}=1-e^{{\frac {2}{n}}(\ln({\mathcal {L}}(0))-\ln({\mathcal {L}}({\widehat {\theta }})))}=1-e^{-D/n}}
  

where D is the test statistic of the likelihood ratio test.
Nico Nagelkerke noted that it had the following properties:

It is consistent with the classical coefficient of determination when both can be computed;
Its value is maximised by the maximum likelihood estimation of a model;
It is asymptotically independent of the sample size;
The interpretation is the proportion of the variation explained by the model;
The values are between 0 and 1, with 0 denoting that model does not explain any variation and 1 denoting that it perfectly explains the observed variation;
It does not have any unit.
However, in the case of a logistic model, where 
  
    
      
        
          
            L
          
        
        (
        
          
            
              θ
              ^
            
          
        
        )
      
    
    {\displaystyle {\mathcal {L}}({\widehat {\theta }})}
  
 cannot be greater than 1, R2 is between 0 and 
  
    
      
        
          R
          
            max
          
          
            2
          
        
        =
        1
        −
        (
        
          
            L
          
        
        (
        0
        )
        
          )
          
            2
            
              /
            
            n
          
        
      
    
    {\displaystyle R_{\max }^{2}=1-({\mathcal {L}}(0))^{2/n}}
  
: thus, Nagelkerke suggested the possibility to define a scaled R2 as R2/R2max.

== Comparison with residual statistics ==
Occasionally, residual statistics are used for indicating goodness of fit. The norm of residuals is calculated as the square-root of the sum of squares of residuals (SSR):

  
    
      
        
          norm of residuals
        
        =
        
          
            S
            
              S
              
                res
              
            
          
        
        =
        ‖
        e
        ‖
        .
      
    
    {\displaystyle {\text{norm of residuals}}={\sqrt {SS_{\text{res}}}}=\|e\|.}
  

Similarly, the reduced chi-square is calculated as the SSR divided by the degrees of freedom.
Both R2 and the norm of residuals have their relative merits. For least squares analysis R2 varies between 0 and 1, with larger numbers indicating better fits and 1 representing a perfect fit. The norm of residuals varies from 0 to infinity with smaller numbers indicating better fits and zero indicating a perfect fit. One advantage and disadvantage of R2 is the 
  
    
      
        S
        
          S
          
            tot
          
        
      
    
    {\displaystyle SS_{\text{tot}}}
  
 term acts to normalize the value. If the yi values are all multiplied by a constant, the norm of residuals will also change by that constant but R2 will stay the same. As a basic example, for the linear least squares fit to the set of data:

R2=0.998, and norm of residuals=0.302.
If all values of y are multiplied by 1000 (for example, in an SI prefix change), then R2 remains the same, but norm of residuals=302.
Another single-parameter indicator of fit is the RMSE of the residuals, or standard deviation of the residuals. This would have a value of 0.135 for the above example given that the fit was linear with an unforced intercept.

== History ==
The creation of the coefficient of determination has been attributed to the geneticist Sewall Wright and was first published in 1921.

== See also ==
Anscombe's quartet
Fraction of variance unexplained
Goodness of fit
Nash–Sutcliffe model efficiency coefficient (hydrological applications)
Pearson product-moment correlation coefficient
Proportional reduction in loss
Regression model validation
Root mean square deviation
Stepwise regression

== Notes ==

== Further reading ==
Gujarati, Damodar N.; Porter, Dawn C. (2009). Basic Econometrics (Fifth ed.). New York: McGraw-Hill/Irwin. pp. 73–78. ISBN 978-0-07-337577-9.
Hughes, Ann; Grawoig, Dennis (1971). Statistics: A Foundation for Analysis. Reading: Addison-Wesley. pp. 344–348. ISBN 0-201-03021-7.
Kmenta, Jan (1986). Elements of Econometrics (Second ed.). New York: Macmillan. pp. 240–243. ISBN 978-0-02-365070-3.
Lewis-Beck, Michael S.; Skalaban, Andrew (1990). "The R-Squared: Some Straight Talk". Political Analysis. 2: 153–171. doi:10.1093/pan/2.1.153. JSTOR 23317769.