kb/data/en.wikipedia.org/wiki/Coefficient_of_determination-5.md

813 lines
14 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Coefficient of determination"
chunk: 6/6
source: "https://en.wikipedia.org/wiki/Coefficient_of_determination"
category: "reference"
tags: "science, encyclopedia"
date_saved: "2026-05-05T07:23:31.318214+00:00"
instance: "kb-cron"
---
It is assumed that the matrix X is standardized with Z-scores and that the column vector
y
{\displaystyle y}
is centered to have a mean of zero. Let the column vector
β
0
{\displaystyle \beta _{0}}
refer to the hypothesized regression parameters and let the column vector
b
{\displaystyle b}
denote the estimated parameters. We can then define
R
2
=
1
(
y
X
b
)
(
y
X
b
)
(
y
X
β
0
)
(
y
X
β
0
)
.
{\displaystyle R^{2}=1-{\frac {(y-Xb)'(y-Xb)}{(y-X\beta _{0})'(y-X\beta _{0})}}.}
An R2 of 75% means that the in-sample accuracy improves by 75% if the data-optimized b solutions are used instead of the hypothesized
β
0
{\displaystyle \beta _{0}}
values. In the special case that
β
0
{\displaystyle \beta _{0}}
is a vector of zeros, we obtain the traditional R2 again.
The individual effect on R2 of deviating from a hypothesis can be computed with
R
{\displaystyle R^{\otimes }}
('R-outer'). This
p
{\displaystyle p}
times
p
{\displaystyle p}
matrix is given by
R
=
(
X
y
~
0
)
(
X
y
~
0
)
(
X
X
)
1
(
y
~
0
y
~
0
)
1
,
{\displaystyle R^{\otimes }=(X'{\tilde {y}}_{0})(X'{\tilde {y}}_{0})'(X'X)^{-1}({\tilde {y}}_{0}'{\tilde {y}}_{0})^{-1},}
where
y
~
0
=
y
X
β
0
{\displaystyle {\tilde {y}}_{0}=y-X\beta _{0}}
. The diagonal elements of
R
{\displaystyle R^{\otimes }}
exactly add up to R2. If regressors are uncorrelated and
β
0
{\displaystyle \beta _{0}}
is a vector of zeros, then the
j
th
{\displaystyle j^{\text{th}}}
diagonal element of
R
{\displaystyle R^{\otimes }}
simply corresponds to the r2 value between
x
j
{\displaystyle x_{j}}
and
y
{\displaystyle y}
. When regressors
x
i
{\displaystyle x_{i}}
and
x
j
{\displaystyle x_{j}}
are correlated,
R
i
i
{\displaystyle R_{ii}^{\otimes }}
might increase at the cost of a decrease in
R
j
j
{\displaystyle R_{jj}^{\otimes }}
. As a result, the diagonal elements of
R
{\displaystyle R^{\otimes }}
may be smaller than 0 and, in more exceptional cases, larger than 1. To deal with such uncertainties, several shrinkage estimators implicitly take a weighted average of the diagonal elements of
R
{\displaystyle R^{\otimes }}
to quantify the relevance of deviating from a hypothesized value. Click on the lasso for an example.
=== R2 in logistic regression ===
In the case of logistic regression, usually fit by maximum likelihood, there are several choices of pseudo-R2.
One is the generalized R2 originally proposed by Cox & Snell, and independently by Magee:
R
2
=
1
(
L
(
0
)
L
(
θ
^
)
)
2
/
n
{\displaystyle R^{2}=1-\left({{\mathcal {L}}(0) \over {\mathcal {L}}({\widehat {\theta }})}\right)^{2/n}}
where
L
(
0
)
{\displaystyle {\mathcal {L}}(0)}
is the likelihood of the model with only the intercept,
L
(
θ
^
)
{\displaystyle {{\mathcal {L}}({\widehat {\theta }})}}
is the likelihood of the estimated model (i.e., the model with a given set of parameter estimates) and n is the sample size. It is easily rewritten to:
R
2
=
1
e
2
n
(
ln
(
L
(
0
)
)
ln
(
L
(
θ
^
)
)
)
=
1
e
D
/
n
{\displaystyle R^{2}=1-e^{{\frac {2}{n}}(\ln({\mathcal {L}}(0))-\ln({\mathcal {L}}({\widehat {\theta }})))}=1-e^{-D/n}}
where D is the test statistic of the likelihood ratio test.
Nico Nagelkerke noted that it had the following properties:
It is consistent with the classical coefficient of determination when both can be computed;
Its value is maximised by the maximum likelihood estimation of a model;
It is asymptotically independent of the sample size;
The interpretation is the proportion of the variation explained by the model;
The values are between 0 and 1, with 0 denoting that model does not explain any variation and 1 denoting that it perfectly explains the observed variation;
It does not have any unit.
However, in the case of a logistic model, where
L
(
θ
^
)
{\displaystyle {\mathcal {L}}({\widehat {\theta }})}
cannot be greater than 1, R2 is between 0 and
R
max
2
=
1
(
L
(
0
)
)
2
/
n
{\displaystyle R_{\max }^{2}=1-({\mathcal {L}}(0))^{2/n}}
: thus, Nagelkerke suggested the possibility to define a scaled R2 as R2/R2max.
== Comparison with residual statistics ==
Occasionally, residual statistics are used for indicating goodness of fit. The norm of residuals is calculated as the square-root of the sum of squares of residuals (SSR):
norm of residuals
=
S
S
res
=
e
.
{\displaystyle {\text{norm of residuals}}={\sqrt {SS_{\text{res}}}}=\|e\|.}
Similarly, the reduced chi-square is calculated as the SSR divided by the degrees of freedom.
Both R2 and the norm of residuals have their relative merits. For least squares analysis R2 varies between 0 and 1, with larger numbers indicating better fits and 1 representing a perfect fit. The norm of residuals varies from 0 to infinity with smaller numbers indicating better fits and zero indicating a perfect fit. One advantage and disadvantage of R2 is the
S
S
tot
{\displaystyle SS_{\text{tot}}}
term acts to normalize the value. If the yi values are all multiplied by a constant, the norm of residuals will also change by that constant but R2 will stay the same. As a basic example, for the linear least squares fit to the set of data:
R2=0.998, and norm of residuals=0.302.
If all values of y are multiplied by 1000 (for example, in an SI prefix change), then R2 remains the same, but norm of residuals=302.
Another single-parameter indicator of fit is the RMSE of the residuals, or standard deviation of the residuals. This would have a value of 0.135 for the above example given that the fit was linear with an unforced intercept.
== History ==
The creation of the coefficient of determination has been attributed to the geneticist Sewall Wright and was first published in 1921.
== See also ==
Anscombe's quartet
Fraction of variance unexplained
Goodness of fit
NashSutcliffe model efficiency coefficient (hydrological applications)
Pearson product-moment correlation coefficient
Proportional reduction in loss
Regression model validation
Root mean square deviation
Stepwise regression
== Notes ==
== Further reading ==
Gujarati, Damodar N.; Porter, Dawn C. (2009). Basic Econometrics (Fifth ed.). New York: McGraw-Hill/Irwin. pp. 7378. ISBN 978-0-07-337577-9.
Hughes, Ann; Grawoig, Dennis (1971). Statistics: A Foundation for Analysis. Reading: Addison-Wesley. pp. 344348. ISBN 0-201-03021-7.
Kmenta, Jan (1986). Elements of Econometrics (Second ed.). New York: Macmillan. pp. 240243. ISBN 978-0-02-365070-3.
Lewis-Beck, Michael S.; Skalaban, Andrew (1990). "The R-Squared: Some Straight Talk". Political Analysis. 2: 153171. doi:10.1093/pan/2.1.153. JSTOR 23317769.