--- title: "Coefficient of determination" chunk: 6/6 source: "https://en.wikipedia.org/wiki/Coefficient_of_determination" category: "reference" tags: "science, encyclopedia" date_saved: "2026-05-05T07:23:31.318214+00:00" instance: "kb-cron" --- It is assumed that the matrix X is standardized with Z-scores and that the column vector y {\displaystyle y} is centered to have a mean of zero. Let the column vector β 0 {\displaystyle \beta _{0}} refer to the hypothesized regression parameters and let the column vector b {\displaystyle b} denote the estimated parameters. We can then define R 2 = 1 − ( y − X b ) ′ ( y − X b ) ( y − X β 0 ) ′ ( y − X β 0 ) . {\displaystyle R^{2}=1-{\frac {(y-Xb)'(y-Xb)}{(y-X\beta _{0})'(y-X\beta _{0})}}.} An R2 of 75% means that the in-sample accuracy improves by 75% if the data-optimized b solutions are used instead of the hypothesized β 0 {\displaystyle \beta _{0}} values. In the special case that β 0 {\displaystyle \beta _{0}} is a vector of zeros, we obtain the traditional R2 again. The individual effect on R2 of deviating from a hypothesis can be computed with R ⊗ {\displaystyle R^{\otimes }} ('R-outer'). This p {\displaystyle p} times p {\displaystyle p} matrix is given by R ⊗ = ( X ′ y ~ 0 ) ( X ′ y ~ 0 ) ′ ( X ′ X ) − 1 ( y ~ 0 ′ y ~ 0 ) − 1 , {\displaystyle R^{\otimes }=(X'{\tilde {y}}_{0})(X'{\tilde {y}}_{0})'(X'X)^{-1}({\tilde {y}}_{0}'{\tilde {y}}_{0})^{-1},} where y ~ 0 = y − X β 0 {\displaystyle {\tilde {y}}_{0}=y-X\beta _{0}} . The diagonal elements of R ⊗ {\displaystyle R^{\otimes }} exactly add up to R2. If regressors are uncorrelated and β 0 {\displaystyle \beta _{0}} is a vector of zeros, then the j th {\displaystyle j^{\text{th}}} diagonal element of R ⊗ {\displaystyle R^{\otimes }} simply corresponds to the r2 value between x j {\displaystyle x_{j}} and y {\displaystyle y} . When regressors x i {\displaystyle x_{i}} and x j {\displaystyle x_{j}} are correlated, R i i ⊗ {\displaystyle R_{ii}^{\otimes }} might increase at the cost of a decrease in R j j ⊗ {\displaystyle R_{jj}^{\otimes }} . As a result, the diagonal elements of R ⊗ {\displaystyle R^{\otimes }} may be smaller than 0 and, in more exceptional cases, larger than 1. To deal with such uncertainties, several shrinkage estimators implicitly take a weighted average of the diagonal elements of R ⊗ {\displaystyle R^{\otimes }} to quantify the relevance of deviating from a hypothesized value. Click on the lasso for an example. === R2 in logistic regression === In the case of logistic regression, usually fit by maximum likelihood, there are several choices of pseudo-R2. One is the generalized R2 originally proposed by Cox & Snell, and independently by Magee: R 2 = 1 − ( L ( 0 ) L ( θ ^ ) ) 2 / n {\displaystyle R^{2}=1-\left({{\mathcal {L}}(0) \over {\mathcal {L}}({\widehat {\theta }})}\right)^{2/n}} where L ( 0 ) {\displaystyle {\mathcal {L}}(0)} is the likelihood of the model with only the intercept, L ( θ ^ ) {\displaystyle {{\mathcal {L}}({\widehat {\theta }})}} is the likelihood of the estimated model (i.e., the model with a given set of parameter estimates) and n is the sample size. It is easily rewritten to: R 2 = 1 − e 2 n ( ln ⁡ ( L ( 0 ) ) − ln ⁡ ( L ( θ ^ ) ) ) = 1 − e − D / n {\displaystyle R^{2}=1-e^{{\frac {2}{n}}(\ln({\mathcal {L}}(0))-\ln({\mathcal {L}}({\widehat {\theta }})))}=1-e^{-D/n}} where D is the test statistic of the likelihood ratio test. Nico Nagelkerke noted that it had the following properties: It is consistent with the classical coefficient of determination when both can be computed; Its value is maximised by the maximum likelihood estimation of a model; It is asymptotically independent of the sample size; The interpretation is the proportion of the variation explained by the model; The values are between 0 and 1, with 0 denoting that model does not explain any variation and 1 denoting that it perfectly explains the observed variation; It does not have any unit. However, in the case of a logistic model, where L ( θ ^ ) {\displaystyle {\mathcal {L}}({\widehat {\theta }})} cannot be greater than 1, R2 is between 0 and R max 2 = 1 − ( L ( 0 ) ) 2 / n {\displaystyle R_{\max }^{2}=1-({\mathcal {L}}(0))^{2/n}} : thus, Nagelkerke suggested the possibility to define a scaled R2 as R2/R2max. == Comparison with residual statistics == Occasionally, residual statistics are used for indicating goodness of fit. The norm of residuals is calculated as the square-root of the sum of squares of residuals (SSR): norm of residuals = S S res = ‖ e ‖ . {\displaystyle {\text{norm of residuals}}={\sqrt {SS_{\text{res}}}}=\|e\|.} Similarly, the reduced chi-square is calculated as the SSR divided by the degrees of freedom. Both R2 and the norm of residuals have their relative merits. For least squares analysis R2 varies between 0 and 1, with larger numbers indicating better fits and 1 representing a perfect fit. The norm of residuals varies from 0 to infinity with smaller numbers indicating better fits and zero indicating a perfect fit. One advantage and disadvantage of R2 is the S S tot {\displaystyle SS_{\text{tot}}} term acts to normalize the value. If the yi values are all multiplied by a constant, the norm of residuals will also change by that constant but R2 will stay the same. As a basic example, for the linear least squares fit to the set of data: R2=0.998, and norm of residuals=0.302. If all values of y are multiplied by 1000 (for example, in an SI prefix change), then R2 remains the same, but norm of residuals=302. Another single-parameter indicator of fit is the RMSE of the residuals, or standard deviation of the residuals. This would have a value of 0.135 for the above example given that the fit was linear with an unforced intercept. == History == The creation of the coefficient of determination has been attributed to the geneticist Sewall Wright and was first published in 1921. == See also == Anscombe's quartet Fraction of variance unexplained Goodness of fit Nash–Sutcliffe model efficiency coefficient (hydrological applications) Pearson product-moment correlation coefficient Proportional reduction in loss Regression model validation Root mean square deviation Stepwise regression == Notes == == Further reading == Gujarati, Damodar N.; Porter, Dawn C. (2009). Basic Econometrics (Fifth ed.). New York: McGraw-Hill/Irwin. pp. 73–78. ISBN 978-0-07-337577-9. Hughes, Ann; Grawoig, Dennis (1971). Statistics: A Foundation for Analysis. Reading: Addison-Wesley. pp. 344–348. ISBN 0-201-03021-7. Kmenta, Jan (1986). Elements of Econometrics (Second ed.). New York: Macmillan. pp. 240–243. ISBN 978-0-02-365070-3. Lewis-Beck, Michael S.; Skalaban, Andrew (1990). "The R-Squared: Some Straight Talk". Political Analysis. 2: 153–171. doi:10.1093/pan/2.1.153. JSTOR 23317769.