kb/data/en.wikipedia.org/wiki/Coefficient_of_determination-4.md

8.8 KiB
Raw Blame History

title chunk source category tags date_saved instance
Coefficient of determination 5/6 https://en.wikipedia.org/wiki/Coefficient_of_determination reference science, encyclopedia 2026-05-05T07:23:31.318214+00:00 kb-cron

The adjusted R2 can be interpreted as an instance of the bias-variance tradeoff. When we consider the performance of a model, a lower error represents a better performance. When the model becomes more complex, the variance will increase whereas the square of bias will decrease, and these two metrics add up to be the total error. Combining these two trends, the bias-variance tradeoff describes a relationship between the performance of the model and its complexity, which is shown as a u-shape curve on the right. For the adjusted R2 specifically, the model complexity (i.e. number of parameters) affects the R2 and the term / frac and thereby captures their attributes in the overall performance of the model. R2 can be interpreted as the variance of the model, which is influenced by the model complexity. A high R2 indicates a lower bias error because the model can better explain the change of Y with predictors. For this reason, we make fewer (erroneous) assumptions, and this results in a lower bias error. Meanwhile, to accommodate fewer assumptions, the model tends to be more complex. Based on bias-variance tradeoff, a higher complexity will lead to a decrease in bias and a better performance (below the optimal line). In R2, the term (1 R2) will be lower with high complexity and resulting in a higher R2, consistently indicating a better performance. On the other hand, the term/frac term is reversely affected by the model complexity. The term/frac will increase when adding regressors (i.e., increased model complexity) and lead to worse performance. Based on bias-variance tradeoff, a higher model complexity (beyond the optimal line) leads to increasing errors and a worse performance. Considering the calculation of R2, more parameters will increase the R2 and lead to an increase in R2. Nevertheless, adding more parameters will increase the term/frac and thus decrease R2. These two trends construct a reverse u-shape relationship between model complexity and R2, which is in consistent with the u-shape trend of model complexity versus overall performance. Unlike R2, which will always increase when model complexity increases, R2 will increase only when the bias eliminated by the added regressor is greater than the variance introduced simultaneously. Using R2 instead of R2 could thereby prevent overfitting. Following the same logic, adjusted R2 can be interpreted as a less biased estimator of the population R2, whereas the observed sample R2 is a positively biased estimate of the population value. Adjusted R2 is more appropriate when evaluating model fit (the variance in the dependent variable accounted for by the independent variables) and in comparing alternative models in the feature selection stage of model building. The principle behind the adjusted R2 statistic can be seen by rewriting the ordinary R2 as

      R
      
        2
      
    
    =
    
      1
      
      
        
          
            
              VAR
            
            
              res
            
          
          
            
              VAR
            
            
              tot
            
          
        
      
    
  

{\displaystyle R^{2}={1-{{\text{VAR}}_{\text{res}} \over {\text{VAR}}_{\text{tot}}}}}

where

        VAR
      
      
        res
      
    
    =
    S
    
      S
      
        res
      
    
    
      /
    
    n
  

{\displaystyle {\text{VAR}}_{\text{res}}=SS_{\text{res}}/n}

and

        VAR
      
      
        tot
      
    
    =
    S
    
      S
      
        tot
      
    
    
      /
    
    n
  

{\displaystyle {\text{VAR}}_{\text{tot}}=SS_{\text{tot}}/n}

are the sample variances of the estimated residuals and the dependent variable respectively, which can be seen as biased estimates of the population variances of the errors and of the dependent variable. These estimates are replaced by statistically unbiased versions:

        VAR
      
      
        res
      
    
    =
    S
    
      S
      
        res
      
    
    
      /
    
    (
    n
    
    p
    )
  

{\displaystyle {\text{VAR}}_{\text{res}}=SS_{\text{res}}/(n-p)}

and

        VAR
      
      
        tot
      
    
    =
    S
    
      S
      
        tot
      
    
    
      /
    
    (
    n
    
    1
    )
  

{\displaystyle {\text{VAR}}_{\text{tot}}=SS_{\text{tot}}/(n-1)}

. Despite using unbiased estimators for the population variances of the error and the dependent variable, adjusted R2 is not an unbiased estimator of the population R2, which results by using the population variances of the errors and the dependent variable instead of estimating them. Ingram Olkin and John W. Pratt derived the minimum-variance unbiased estimator for the population R2, which is known as OlkinPratt estimator. Comparisons of different approaches for adjusting R2 concluded that in most situations either an approximate version of the OlkinPratt estimator or the exact OlkinPratt estimator should be preferred over (Ezekiel) adjusted R2.

=== Coefficient of partial determination ===

The coefficient of partial determination can be defined as the proportion of variation that cannot be explained in a reduced model, but can be explained by the predictors specified in a full model. This coefficient is used to provide insight into whether or not one or more additional predictors may be useful in a more fully specified regression model. The calculation for the partial R2 is relatively straightforward after estimating two models and generating the ANOVA tables for them. The calculation for the partial R2 is

          S
          
            S
            
               res, reduced
            
          
          
          S
          
            S
            
               res, full
            
          
        
        
          S
          
            S
            
               res, reduced
            
          
        
      
    
    ,
  

{\displaystyle {\frac {SS_{\text{ res, reduced}}-SS_{\text{ res, full}}}{SS_{\text{ res, reduced}}}},}

which is analogous to the usual coefficient of determination:

          S
          
            S
            
              tot
            
          
          
          S
          
            S
            
              res
            
          
        
        
          S
          
            S
            
              tot
            
          
        
      
    
    .
  

{\displaystyle {\frac {SS_{\text{tot}}-SS_{\text{res}}}{SS_{\text{tot}}}}.}

=== Generalizing and decomposing R2 === As explained above, model selection heuristics such as the adjusted R2 criterion and the F-test examine whether the total R2 sufficiently increases to determine if a new regressor should be added to the model. If a regressor is added to the model that is highly correlated with other regressors which have already been included, then the total R2 will hardly increase, even if the new regressor is of relevance. As a result, the above-mentioned heuristics will ignore relevant regressors when cross-correlations are high.

Alternatively, one can decompose a generalized version of R2 to quantify the relevance of deviating from a hypothesis. As Hoornweg (2018) shows, several shrinkage estimators such as Bayesian linear regression, ridge regression, and the (adaptive) lasso make use of this decomposition of R2 when they gradually shrink parameters from the unrestricted OLS solutions towards the hypothesized values. Let us first define the linear regression model as

    y
    =
    X
    β
    +
    ε
    .
  

{\displaystyle y=X\beta +\varepsilon .}