3  Multiple Linear Regression

Linear regression with any number of explanatory variables.


Multiple linear regression allows you to simultaneously take into account the effect of multiple explanatory variables.

Why not just several one-on-one analyses?

A single model with all explanatory effects is usually preferable over fitting several simple linear regressions, because of omitted-variable bias, confounding, and Simpson’s paradox, among others:

Simple one-on-one analyses (also called correlation analysis) is prone to spurious findings. Multiple regression on the other hand, allows you to see what the effect of one variable is given the effect of all other included variables.

The simple fact is that almost everything correlates to some extent with everything else. Don’t believe me? Here is a correlogram of purely randomly drawn numbers:

library("corrplot")
set.seed(1)
X <- matrix(rnorm(100), nrow = 5, ncol = 20)
corrplot(cor(X))

Dark blue means a correlation close to \(1\) and dark red close to \(-1\). Note how many ‘strong’ correlations were found among independently drawn values.