813 lines
14 KiB
Markdown
813 lines
14 KiB
Markdown
---
|
||
title: "Coefficient of determination"
|
||
chunk: 6/6
|
||
source: "https://en.wikipedia.org/wiki/Coefficient_of_determination"
|
||
category: "reference"
|
||
tags: "science, encyclopedia"
|
||
date_saved: "2026-05-05T07:23:31.318214+00:00"
|
||
instance: "kb-cron"
|
||
---
|
||
|
||
It is assumed that the matrix X is standardized with Z-scores and that the column vector
|
||
|
||
|
||
|
||
y
|
||
|
||
|
||
{\displaystyle y}
|
||
|
||
is centered to have a mean of zero. Let the column vector
|
||
|
||
|
||
|
||
|
||
β
|
||
|
||
0
|
||
|
||
|
||
|
||
|
||
{\displaystyle \beta _{0}}
|
||
|
||
refer to the hypothesized regression parameters and let the column vector
|
||
|
||
|
||
|
||
b
|
||
|
||
|
||
{\displaystyle b}
|
||
|
||
denote the estimated parameters. We can then define
|
||
|
||
|
||
|
||
|
||
|
||
R
|
||
|
||
2
|
||
|
||
|
||
=
|
||
1
|
||
−
|
||
|
||
|
||
|
||
(
|
||
y
|
||
−
|
||
X
|
||
b
|
||
|
||
)
|
||
′
|
||
|
||
(
|
||
y
|
||
−
|
||
X
|
||
b
|
||
)
|
||
|
||
|
||
(
|
||
y
|
||
−
|
||
X
|
||
|
||
β
|
||
|
||
0
|
||
|
||
|
||
|
||
)
|
||
′
|
||
|
||
(
|
||
y
|
||
−
|
||
X
|
||
|
||
β
|
||
|
||
0
|
||
|
||
|
||
)
|
||
|
||
|
||
|
||
.
|
||
|
||
|
||
{\displaystyle R^{2}=1-{\frac {(y-Xb)'(y-Xb)}{(y-X\beta _{0})'(y-X\beta _{0})}}.}
|
||
|
||
|
||
An R2 of 75% means that the in-sample accuracy improves by 75% if the data-optimized b solutions are used instead of the hypothesized
|
||
|
||
|
||
|
||
|
||
β
|
||
|
||
0
|
||
|
||
|
||
|
||
|
||
{\displaystyle \beta _{0}}
|
||
|
||
values. In the special case that
|
||
|
||
|
||
|
||
|
||
β
|
||
|
||
0
|
||
|
||
|
||
|
||
|
||
{\displaystyle \beta _{0}}
|
||
|
||
is a vector of zeros, we obtain the traditional R2 again.
|
||
The individual effect on R2 of deviating from a hypothesis can be computed with
|
||
|
||
|
||
|
||
|
||
R
|
||
|
||
⊗
|
||
|
||
|
||
|
||
|
||
{\displaystyle R^{\otimes }}
|
||
|
||
('R-outer'). This
|
||
|
||
|
||
|
||
p
|
||
|
||
|
||
{\displaystyle p}
|
||
|
||
times
|
||
|
||
|
||
|
||
p
|
||
|
||
|
||
{\displaystyle p}
|
||
|
||
matrix is given by
|
||
|
||
|
||
|
||
|
||
|
||
R
|
||
|
||
⊗
|
||
|
||
|
||
=
|
||
(
|
||
|
||
X
|
||
′
|
||
|
||
|
||
|
||
|
||
|
||
y
|
||
~
|
||
|
||
|
||
|
||
|
||
0
|
||
|
||
|
||
)
|
||
(
|
||
|
||
X
|
||
′
|
||
|
||
|
||
|
||
|
||
|
||
y
|
||
~
|
||
|
||
|
||
|
||
|
||
0
|
||
|
||
|
||
|
||
)
|
||
′
|
||
|
||
(
|
||
|
||
X
|
||
′
|
||
|
||
X
|
||
|
||
)
|
||
|
||
−
|
||
1
|
||
|
||
|
||
(
|
||
|
||
|
||
|
||
|
||
y
|
||
~
|
||
|
||
|
||
|
||
|
||
0
|
||
|
||
′
|
||
|
||
|
||
|
||
|
||
|
||
y
|
||
~
|
||
|
||
|
||
|
||
|
||
0
|
||
|
||
|
||
|
||
)
|
||
|
||
−
|
||
1
|
||
|
||
|
||
,
|
||
|
||
|
||
{\displaystyle R^{\otimes }=(X'{\tilde {y}}_{0})(X'{\tilde {y}}_{0})'(X'X)^{-1}({\tilde {y}}_{0}'{\tilde {y}}_{0})^{-1},}
|
||
|
||
|
||
where
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
y
|
||
~
|
||
|
||
|
||
|
||
|
||
0
|
||
|
||
|
||
=
|
||
y
|
||
−
|
||
X
|
||
|
||
β
|
||
|
||
0
|
||
|
||
|
||
|
||
|
||
{\displaystyle {\tilde {y}}_{0}=y-X\beta _{0}}
|
||
|
||
. The diagonal elements of
|
||
|
||
|
||
|
||
|
||
R
|
||
|
||
⊗
|
||
|
||
|
||
|
||
|
||
{\displaystyle R^{\otimes }}
|
||
|
||
exactly add up to R2. If regressors are uncorrelated and
|
||
|
||
|
||
|
||
|
||
β
|
||
|
||
0
|
||
|
||
|
||
|
||
|
||
{\displaystyle \beta _{0}}
|
||
|
||
is a vector of zeros, then the
|
||
|
||
|
||
|
||
|
||
j
|
||
|
||
th
|
||
|
||
|
||
|
||
|
||
{\displaystyle j^{\text{th}}}
|
||
|
||
diagonal element of
|
||
|
||
|
||
|
||
|
||
R
|
||
|
||
⊗
|
||
|
||
|
||
|
||
|
||
{\displaystyle R^{\otimes }}
|
||
|
||
simply corresponds to the r2 value between
|
||
|
||
|
||
|
||
|
||
x
|
||
|
||
j
|
||
|
||
|
||
|
||
|
||
{\displaystyle x_{j}}
|
||
|
||
and
|
||
|
||
|
||
|
||
y
|
||
|
||
|
||
{\displaystyle y}
|
||
|
||
. When regressors
|
||
|
||
|
||
|
||
|
||
x
|
||
|
||
i
|
||
|
||
|
||
|
||
|
||
{\displaystyle x_{i}}
|
||
|
||
and
|
||
|
||
|
||
|
||
|
||
x
|
||
|
||
j
|
||
|
||
|
||
|
||
|
||
{\displaystyle x_{j}}
|
||
|
||
are correlated,
|
||
|
||
|
||
|
||
|
||
R
|
||
|
||
i
|
||
i
|
||
|
||
|
||
⊗
|
||
|
||
|
||
|
||
|
||
{\displaystyle R_{ii}^{\otimes }}
|
||
|
||
might increase at the cost of a decrease in
|
||
|
||
|
||
|
||
|
||
R
|
||
|
||
j
|
||
j
|
||
|
||
|
||
⊗
|
||
|
||
|
||
|
||
|
||
{\displaystyle R_{jj}^{\otimes }}
|
||
|
||
. As a result, the diagonal elements of
|
||
|
||
|
||
|
||
|
||
R
|
||
|
||
⊗
|
||
|
||
|
||
|
||
|
||
{\displaystyle R^{\otimes }}
|
||
|
||
may be smaller than 0 and, in more exceptional cases, larger than 1. To deal with such uncertainties, several shrinkage estimators implicitly take a weighted average of the diagonal elements of
|
||
|
||
|
||
|
||
|
||
R
|
||
|
||
⊗
|
||
|
||
|
||
|
||
|
||
{\displaystyle R^{\otimes }}
|
||
|
||
to quantify the relevance of deviating from a hypothesized value. Click on the lasso for an example.
|
||
|
||
=== R2 in logistic regression ===
|
||
In the case of logistic regression, usually fit by maximum likelihood, there are several choices of pseudo-R2.
|
||
One is the generalized R2 originally proposed by Cox & Snell, and independently by Magee:
|
||
|
||
|
||
|
||
|
||
|
||
R
|
||
|
||
2
|
||
|
||
|
||
=
|
||
1
|
||
−
|
||
|
||
|
||
(
|
||
|
||
|
||
|
||
|
||
|
||
L
|
||
|
||
|
||
(
|
||
0
|
||
)
|
||
|
||
|
||
|
||
|
||
L
|
||
|
||
|
||
(
|
||
|
||
|
||
|
||
θ
|
||
^
|
||
|
||
|
||
|
||
)
|
||
|
||
|
||
|
||
)
|
||
|
||
|
||
2
|
||
|
||
/
|
||
|
||
n
|
||
|
||
|
||
|
||
|
||
{\displaystyle R^{2}=1-\left({{\mathcal {L}}(0) \over {\mathcal {L}}({\widehat {\theta }})}\right)^{2/n}}
|
||
|
||
|
||
where
|
||
|
||
|
||
|
||
|
||
|
||
L
|
||
|
||
|
||
(
|
||
0
|
||
)
|
||
|
||
|
||
{\displaystyle {\mathcal {L}}(0)}
|
||
|
||
is the likelihood of the model with only the intercept,
|
||
|
||
|
||
|
||
|
||
|
||
|
||
L
|
||
|
||
|
||
(
|
||
|
||
|
||
|
||
θ
|
||
^
|
||
|
||
|
||
|
||
)
|
||
|
||
|
||
|
||
{\displaystyle {{\mathcal {L}}({\widehat {\theta }})}}
|
||
|
||
is the likelihood of the estimated model (i.e., the model with a given set of parameter estimates) and n is the sample size. It is easily rewritten to:
|
||
|
||
|
||
|
||
|
||
|
||
R
|
||
|
||
2
|
||
|
||
|
||
=
|
||
1
|
||
−
|
||
|
||
e
|
||
|
||
|
||
|
||
2
|
||
n
|
||
|
||
|
||
(
|
||
ln
|
||
|
||
(
|
||
|
||
|
||
L
|
||
|
||
|
||
(
|
||
0
|
||
)
|
||
)
|
||
−
|
||
ln
|
||
|
||
(
|
||
|
||
|
||
L
|
||
|
||
|
||
(
|
||
|
||
|
||
|
||
θ
|
||
^
|
||
|
||
|
||
|
||
)
|
||
)
|
||
)
|
||
|
||
|
||
=
|
||
1
|
||
−
|
||
|
||
e
|
||
|
||
−
|
||
D
|
||
|
||
/
|
||
|
||
n
|
||
|
||
|
||
|
||
|
||
{\displaystyle R^{2}=1-e^{{\frac {2}{n}}(\ln({\mathcal {L}}(0))-\ln({\mathcal {L}}({\widehat {\theta }})))}=1-e^{-D/n}}
|
||
|
||
|
||
where D is the test statistic of the likelihood ratio test.
|
||
Nico Nagelkerke noted that it had the following properties:
|
||
|
||
It is consistent with the classical coefficient of determination when both can be computed;
|
||
Its value is maximised by the maximum likelihood estimation of a model;
|
||
It is asymptotically independent of the sample size;
|
||
The interpretation is the proportion of the variation explained by the model;
|
||
The values are between 0 and 1, with 0 denoting that model does not explain any variation and 1 denoting that it perfectly explains the observed variation;
|
||
It does not have any unit.
|
||
However, in the case of a logistic model, where
|
||
|
||
|
||
|
||
|
||
|
||
L
|
||
|
||
|
||
(
|
||
|
||
|
||
|
||
θ
|
||
^
|
||
|
||
|
||
|
||
)
|
||
|
||
|
||
{\displaystyle {\mathcal {L}}({\widehat {\theta }})}
|
||
|
||
cannot be greater than 1, R2 is between 0 and
|
||
|
||
|
||
|
||
|
||
R
|
||
|
||
max
|
||
|
||
|
||
2
|
||
|
||
|
||
=
|
||
1
|
||
−
|
||
(
|
||
|
||
|
||
L
|
||
|
||
|
||
(
|
||
0
|
||
)
|
||
|
||
)
|
||
|
||
2
|
||
|
||
/
|
||
|
||
n
|
||
|
||
|
||
|
||
|
||
{\displaystyle R_{\max }^{2}=1-({\mathcal {L}}(0))^{2/n}}
|
||
|
||
: thus, Nagelkerke suggested the possibility to define a scaled R2 as R2/R2max.
|
||
|
||
== Comparison with residual statistics ==
|
||
Occasionally, residual statistics are used for indicating goodness of fit. The norm of residuals is calculated as the square-root of the sum of squares of residuals (SSR):
|
||
|
||
|
||
|
||
|
||
|
||
norm of residuals
|
||
|
||
=
|
||
|
||
|
||
S
|
||
|
||
S
|
||
|
||
res
|
||
|
||
|
||
|
||
|
||
=
|
||
‖
|
||
e
|
||
‖
|
||
.
|
||
|
||
|
||
{\displaystyle {\text{norm of residuals}}={\sqrt {SS_{\text{res}}}}=\|e\|.}
|
||
|
||
|
||
Similarly, the reduced chi-square is calculated as the SSR divided by the degrees of freedom.
|
||
Both R2 and the norm of residuals have their relative merits. For least squares analysis R2 varies between 0 and 1, with larger numbers indicating better fits and 1 representing a perfect fit. The norm of residuals varies from 0 to infinity with smaller numbers indicating better fits and zero indicating a perfect fit. One advantage and disadvantage of R2 is the
|
||
|
||
|
||
|
||
S
|
||
|
||
S
|
||
|
||
tot
|
||
|
||
|
||
|
||
|
||
{\displaystyle SS_{\text{tot}}}
|
||
|
||
term acts to normalize the value. If the yi values are all multiplied by a constant, the norm of residuals will also change by that constant but R2 will stay the same. As a basic example, for the linear least squares fit to the set of data:
|
||
|
||
R2=0.998, and norm of residuals=0.302.
|
||
If all values of y are multiplied by 1000 (for example, in an SI prefix change), then R2 remains the same, but norm of residuals=302.
|
||
Another single-parameter indicator of fit is the RMSE of the residuals, or standard deviation of the residuals. This would have a value of 0.135 for the above example given that the fit was linear with an unforced intercept.
|
||
|
||
== History ==
|
||
The creation of the coefficient of determination has been attributed to the geneticist Sewall Wright and was first published in 1921.
|
||
|
||
== See also ==
|
||
Anscombe's quartet
|
||
Fraction of variance unexplained
|
||
Goodness of fit
|
||
Nash–Sutcliffe model efficiency coefficient (hydrological applications)
|
||
Pearson product-moment correlation coefficient
|
||
Proportional reduction in loss
|
||
Regression model validation
|
||
Root mean square deviation
|
||
Stepwise regression
|
||
|
||
== Notes ==
|
||
|
||
== Further reading ==
|
||
Gujarati, Damodar N.; Porter, Dawn C. (2009). Basic Econometrics (Fifth ed.). New York: McGraw-Hill/Irwin. pp. 73–78. ISBN 978-0-07-337577-9.
|
||
Hughes, Ann; Grawoig, Dennis (1971). Statistics: A Foundation for Analysis. Reading: Addison-Wesley. pp. 344–348. ISBN 0-201-03021-7.
|
||
Kmenta, Jan (1986). Elements of Econometrics (Second ed.). New York: Macmillan. pp. 240–243. ISBN 978-0-02-365070-3.
|
||
Lewis-Beck, Michael S.; Skalaban, Andrew (1990). "The R-Squared: Some Straight Talk". Political Analysis. 2: 153–171. doi:10.1093/pan/2.1.153. JSTOR 23317769. |