Residual Formulas

Complete mathematical reference for all types of residual calculations, including derivations and practical applications.

Basic Residuals

Simple Residual

$$e_i = y_i - \hat{y}_i$$

Where:

  • $e_i$ = residual for observation $i$
  • $y_i$ = observed value for observation $i$
  • $\hat{y}_i$ = predicted value for observation $i$

Applications:

  • Basic model evaluation
  • Error measurement
  • Initial diagnostic analysis
  • Outlier identification

Residual Sum of Squares (RSS)

$$RSS = \sum_{i=1}^{n} e_i^2 = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

Alternative notations:

$$SSE = \sum_{i=1}^{n} e_i^2$$

(Sum of Squared Errors)

Purpose:

Measures the total amount of unexplained variation in the model. Lower RSS indicates better fit.

Mean Squared Error (MSE)

$$MSE = \frac{RSS}{n-p} = \frac{\sum_{i=1}^{n} e_i^2}{n-p}$$

Where:

  • $n$ = number of observations
  • $p$ = number of parameters (including intercept)
  • $n-p$ = degrees of freedom

Standardized Residuals

Standard Residual

$$r_i = \frac{e_i}{s}$$

Where:

  • $r_i$ = standardized residual for observation $i$
  • $e_i$ = raw residual for observation $i$
  • $s$ = residual standard error = $\sqrt{MSE}$

Interpretation Guidelines:

$|r_i| < 2$ Normal observation
$2 \leq |r_i| < 3$ Potential outlier
$|r_i| \geq 3$ Likely outlier

Internally Studentized Residual

$$r_i = \frac{e_i}{s\sqrt{1-h_i}}$$

Where:

  • $h_i$ = leverage of observation $i$
  • $s$ = residual standard error
  • $1-h_i$ = adjustment for leverage

Leverage Formula:

$$h_i = \mathbf{x}_i^T(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{x}_i$$

For simple linear regression: $h_i = \frac{1}{n} + \frac{(x_i - \bar{x})^2}{\sum(x_j - \bar{x})^2}$

Studentized Residuals

Externally Studentized Residual

$$t_i = \frac{e_i}{s_{(i)}\sqrt{1-h_i}}$$

Where:

  • $s_{(i)}$ = residual standard error calculated without observation $i$
  • $h_i$ = leverage of observation $i$

Calculation Steps:

  1. Remove observation $i$ from dataset
  2. Fit model to remaining $n-1$ observations
  3. Calculate $s_{(i)}$ from this reduced model
  4. Apply formula above

Distribution:

$t_i$ follows a $t$-distribution with $n-p-1$ degrees of freedom

Relationship Between Residual Types

$$t_i = r_i\sqrt{\frac{n-p-1}{n-p-r_i^2}}$$

This shows the relationship between internally studentized ($r_i$) and externally studentized ($t_i$) residuals.

Practical Note:

When $r_i$ is small, $t_i \approx r_i$. The difference becomes significant for large residuals.

Advanced Formulas

PRESS Residual

$$e_{(i)} = y_i - \hat{y}_{(i)}$$

Where:

  • $e_{(i)}$ = PRESS residual for observation $i$
  • $\hat{y}_{(i)}$ = prediction for $y_i$ using model fitted without observation $i$

Relationship to Standard Residual:

$$e_{(i)} = \frac{e_i}{1-h_i}$$

PRESS Statistic:

$$PRESS = \sum_{i=1}^{n} e_{(i)}^2$$

Used for model validation and comparison

Recursive Residual

$$w_t = \frac{y_t - \mathbf{x}_t^T\hat{\boldsymbol{\beta}}_{t-1}}{\sqrt{1 + \mathbf{x}_t^T(\mathbf{X}_{t-1}^T\mathbf{X}_{t-1})^{-1}\mathbf{x}_t}}$$

Where:

  • $w_t$ = recursive residual at time $t$
  • $\hat{\boldsymbol{\beta}}_{t-1}$ = parameter estimates using first $t-1$ observations
  • $\mathbf{X}_{t-1}$ = design matrix for first $t-1$ observations

Applications:

  • Structural break testing
  • Model stability analysis
  • Sequential data analysis

Partial Residual

$$e_i^{(j)} = e_i + \hat{\beta}_j x_{ij}$$

Where:

  • $e_i^{(j)}$ = partial residual for variable $j$ and observation $i$
  • $e_i$ = ordinary residual
  • $\hat{\beta}_j$ = estimated coefficient for variable $j$
  • $x_{ij}$ = value of variable $j$ for observation $i$

Purpose:

Used in partial residual plots to check linearity of individual predictors in multiple regression.

Quick Reference Table

Residual Type Formula Main Use Outlier Threshold
Raw $e_i = y_i - \hat{y}_i$ Basic analysis Context dependent
Standardized $r_i = e_i / s$ Scale-free comparison $|r_i| > 2$
Studentized (Internal) $r_i = e_i / (s\sqrt{1-h_i})$ Leverage adjustment $|r_i| > 2$
Studentized (External) $t_i = e_i / (s_{(i)}\sqrt{1-h_i})$ Robust outlier detection $|t_i| > 2.5$
PRESS $e_{(i)} = e_i / (1-h_i)$ Cross-validation Large PRESS values

Apply These Formulas

Use our calculator to compute these residuals automatically with your data.