14 KiB
| title | chunk | source | category | tags | date_saved | instance |
|---|---|---|---|---|---|---|
| Coefficient of determination | 6/6 | https://en.wikipedia.org/wiki/Coefficient_of_determination | reference | science, encyclopedia | 2026-05-05T07:23:31.318214+00:00 | kb-cron |
It is assumed that the matrix X is standardized with Z-scores and that the column vector
y
{\displaystyle y}
is centered to have a mean of zero. Let the column vector
β
0
{\displaystyle \beta _{0}}
refer to the hypothesized regression parameters and let the column vector
b
{\displaystyle b}
denote the estimated parameters. We can then define
R
2
=
1
−
(
y
−
X
b
)
′
(
y
−
X
b
)
(
y
−
X
β
0
)
′
(
y
−
X
β
0
)
.
{\displaystyle R^{2}=1-{\frac {(y-Xb)'(y-Xb)}{(y-X\beta _{0})'(y-X\beta _{0})}}.}
An R2 of 75% means that the in-sample accuracy improves by 75% if the data-optimized b solutions are used instead of the hypothesized
β
0
{\displaystyle \beta _{0}}
values. In the special case that
β
0
{\displaystyle \beta _{0}}
is a vector of zeros, we obtain the traditional R2 again. The individual effect on R2 of deviating from a hypothesis can be computed with
R
⊗
{\displaystyle R^{\otimes }}
('R-outer'). This
p
{\displaystyle p}
times
p
{\displaystyle p}
matrix is given by
R
⊗
=
(
X
′
y
~
0
)
(
X
′
y
~
0
)
′
(
X
′
X
)
−
1
(
y
~
0
′
y
~
0
)
−
1
,
{\displaystyle R^{\otimes }=(X'{\tilde {y}}_{0})(X'{\tilde {y}}_{0})'(X'X)^{-1}({\tilde {y}}_{0}'{\tilde {y}}_{0})^{-1},}
where
y
~
0
=
y
−
X
β
0
{\displaystyle {\tilde {y}}_{0}=y-X\beta _{0}}
. The diagonal elements of
R
⊗
{\displaystyle R^{\otimes }}
exactly add up to R2. If regressors are uncorrelated and
β
0
{\displaystyle \beta _{0}}
is a vector of zeros, then the
j
th
{\displaystyle j^{\text{th}}}
diagonal element of
R
⊗
{\displaystyle R^{\otimes }}
simply corresponds to the r2 value between
x
j
{\displaystyle x_{j}}
and
y
{\displaystyle y}
. When regressors
x
i
{\displaystyle x_{i}}
and
x
j
{\displaystyle x_{j}}
are correlated,
R
i
i
⊗
{\displaystyle R_{ii}^{\otimes }}
might increase at the cost of a decrease in
R
j
j
⊗
{\displaystyle R_{jj}^{\otimes }}
. As a result, the diagonal elements of
R
⊗
{\displaystyle R^{\otimes }}
may be smaller than 0 and, in more exceptional cases, larger than 1. To deal with such uncertainties, several shrinkage estimators implicitly take a weighted average of the diagonal elements of
R
⊗
{\displaystyle R^{\otimes }}
to quantify the relevance of deviating from a hypothesized value. Click on the lasso for an example.
=== R2 in logistic regression === In the case of logistic regression, usually fit by maximum likelihood, there are several choices of pseudo-R2. One is the generalized R2 originally proposed by Cox & Snell, and independently by Magee:
R
2
=
1
−
(
L
(
0
)
L
(
θ
^
)
)
2
/
n
{\displaystyle R^{2}=1-\left({{\mathcal {L}}(0) \over {\mathcal {L}}({\widehat {\theta }})}\right)^{2/n}}
where
L
(
0
)
{\displaystyle {\mathcal {L}}(0)}
is the likelihood of the model with only the intercept,
L
(
θ
^
)
{\displaystyle {{\mathcal {L}}({\widehat {\theta }})}}
is the likelihood of the estimated model (i.e., the model with a given set of parameter estimates) and n is the sample size. It is easily rewritten to:
R
2
=
1
−
e
2
n
(
ln
(
L
(
0
)
)
−
ln
(
L
(
θ
^
)
)
)
=
1
−
e
−
D
/
n
{\displaystyle R^{2}=1-e^{{\frac {2}{n}}(\ln({\mathcal {L}}(0))-\ln({\mathcal {L}}({\widehat {\theta }})))}=1-e^{-D/n}}
where D is the test statistic of the likelihood ratio test. Nico Nagelkerke noted that it had the following properties:
It is consistent with the classical coefficient of determination when both can be computed; Its value is maximised by the maximum likelihood estimation of a model; It is asymptotically independent of the sample size; The interpretation is the proportion of the variation explained by the model; The values are between 0 and 1, with 0 denoting that model does not explain any variation and 1 denoting that it perfectly explains the observed variation; It does not have any unit. However, in the case of a logistic model, where
L
(
θ
^
)
{\displaystyle {\mathcal {L}}({\widehat {\theta }})}
cannot be greater than 1, R2 is between 0 and
R
max
2
=
1
−
(
L
(
0
)
)
2
/
n
{\displaystyle R_{\max }^{2}=1-({\mathcal {L}}(0))^{2/n}}
: thus, Nagelkerke suggested the possibility to define a scaled R2 as R2/R2max.
== Comparison with residual statistics == Occasionally, residual statistics are used for indicating goodness of fit. The norm of residuals is calculated as the square-root of the sum of squares of residuals (SSR):
norm of residuals
=
S
S
res
=
‖
e
‖
.
{\displaystyle {\text{norm of residuals}}={\sqrt {SS_{\text{res}}}}=\|e\|.}
Similarly, the reduced chi-square is calculated as the SSR divided by the degrees of freedom. Both R2 and the norm of residuals have their relative merits. For least squares analysis R2 varies between 0 and 1, with larger numbers indicating better fits and 1 representing a perfect fit. The norm of residuals varies from 0 to infinity with smaller numbers indicating better fits and zero indicating a perfect fit. One advantage and disadvantage of R2 is the
S
S
tot
{\displaystyle SS_{\text{tot}}}
term acts to normalize the value. If the yi values are all multiplied by a constant, the norm of residuals will also change by that constant but R2 will stay the same. As a basic example, for the linear least squares fit to the set of data:
R2=0.998, and norm of residuals=0.302. If all values of y are multiplied by 1000 (for example, in an SI prefix change), then R2 remains the same, but norm of residuals=302. Another single-parameter indicator of fit is the RMSE of the residuals, or standard deviation of the residuals. This would have a value of 0.135 for the above example given that the fit was linear with an unforced intercept.
== History == The creation of the coefficient of determination has been attributed to the geneticist Sewall Wright and was first published in 1921.
== See also == Anscombe's quartet Fraction of variance unexplained Goodness of fit Nash–Sutcliffe model efficiency coefficient (hydrological applications) Pearson product-moment correlation coefficient Proportional reduction in loss Regression model validation Root mean square deviation Stepwise regression
== Notes ==
== Further reading == Gujarati, Damodar N.; Porter, Dawn C. (2009). Basic Econometrics (Fifth ed.). New York: McGraw-Hill/Irwin. pp. 73–78. ISBN 978-0-07-337577-9. Hughes, Ann; Grawoig, Dennis (1971). Statistics: A Foundation for Analysis. Reading: Addison-Wesley. pp. 344–348. ISBN 0-201-03021-7. Kmenta, Jan (1986). Elements of Econometrics (Second ed.). New York: Macmillan. pp. 240–243. ISBN 978-0-02-365070-3. Lewis-Beck, Michael S.; Skalaban, Andrew (1990). "The R-Squared: Some Straight Talk". Political Analysis. 2: 153–171. doi:10.1093/pan/2.1.153. JSTOR 23317769.