3 Multiple Linear Regression

Linear regression with any number of explanatory variables.

Multiple linear regression allows you to simultaneously take into account the effect of multiple explanatory variables.

Why not just several one-on-one analyses?

One-on-one measures of association, like a correlation coefficient, are easy to compute and hence often used to explore relationships. However, these fail to reflect the complexity of real-life phenomena. Cancer is not caused by a single mutation; speciation not triggered by a single trait change.

Correlation analysis has been criticized in many biological fields, including clinical science [1]–[3], ecology [4] and evolutionary biology [5]. Key phenomena invalidating one-on-one analyses are explained below.

Estimating the effect of one variable without accounting for the effect of others can lead to a biased result.

Figure 3.1: True causal relationship (left) and biased estimate (right) if \(\boldsymbol{O}\) is not included in the analysis.

Example

In a study on asbestos exposure, smoking may be more common among individuals exposed to asbestos. This would lead to an overestimated effect of asbestos, if smoking behavior is not adjusted for. Large systematic reviews on asbestos exposure therefore almost always adjust for smoking [6]–[8].

A special case of omitted-variable bias—unrelated variables can be correlated due to a third, unobserved variable.

Figure 3.2: True causal relationship (left) and biased estimate (right) if \(\boldsymbol{C}\) is not included in the analysis.

Example

Coffee consumption is often linked to lung cancer, but this is caused by the confounding variable smoking: On average, smokers drink more coffee and are much more likely to develop lung cancer. Large overarching meta-analyses adjusting for smoking typically show that coffee consumption is not a risk factor for lung cancer [9]–[11].

Trends can appear, disappear and even completely change direction when including another variable in the model.

Figure 3.3: Observed trend excluding (left) and including (right) a grouping variable (\(x_2\)).

Example

A study on kidney stones compared two methods for removal: open surgery and percutaneous nephrolithotomy (PN) [12]. Surprisingly, the less invasive PN method displayed a higher success rate than open surgery.

As shown in Table 3.1, this is an example of Simpson’s paradox: Separating by kidney stone size reveals that open surgery outperforms PN for both situations individually, but not when comparing the totals.

Table 3.1: Success rates of kidney stone removal methods by size.

Size	Open Surgery	Percutaneous Nephrolithotomy
\(< 2\) cm	\(\frac{81}{87}\) (93.1%)	\(\frac{234}{270}\) (86.7%)
\(\geq 2\) cm	\(\frac{192}{263}\) (73.0%)	\(\frac{55}{80}\) (68.8%)
total	\(\frac{273}{350}\) (78.0%)	\(\frac{289}{350}\) (82.6%)

How is this contradiction possible? Open surgery is used in more often in severe cases (i.e., larger stones), which have a lower success rate. This lowers the total success rate below that of PN. Failing to account for this second variable (size) leads to the wrong conclusion that open surgery performs worse.

Contrary to one-on-one analyses, multiple regression can simultaneously take into account the effect of multiple contributing factors. This allows you to see what the effect of one variable is given the effect of all other included variables, which avoids the phenomena described above.

[1]

P. Ranganathan and R. Aggarwal, “Common pitfalls in statistical analysis: The use of correlation techniques,” Perspectives in Clinical Research, vol. 7, no. 4, p. 187, 2016, doi: 10.4103/2229-3485.192046.

[2]

M. Hung, J. Bounsanga, and M. W. Voss, “Interpretation of correlations in clinical research,” Postgraduate Medicine, vol. 129, no. 8, pp. 902–906, Sep. 2017, doi: 10.1080/00325481.2017.1383820.

[3]

R. Wang, S. Dutta, and V. Roy, “A note on marginal correlation based screening,” Statistical Analysis and Data Mining: The ASA Data Science Journal, vol. 14, no. 1, pp. 88–92, Dec. 2020, doi: 10.1002/sam.11491.

[4]

A. Carr, C. Diener, N. S. Baliga, and S. M. Gibbons, “Use and abuse of correlation analyses in microbial ecology,” The ISME Journal, vol. 13, no. 11, pp. 2647–2655, Jun. 2019, doi: 10.1038/s41396-019-0459-z.

[5]

C. W. Dunn, F. Zapata, C. Munro, S. Siebert, and A. Hejnol, “Pairwise comparisons across species are problematic when analyzing functional genomic data,” Proceedings of the National Academy of Sciences, vol. 115, no. 3, Jan. 2018, doi: 10.1073/pnas.1707515115.

[6]

Y. Ngamwong et al., “Additive Synergism between Asbestos and Smoking in Lung Cancer Risk: A Systematic Review and Meta-Analysis,” PLOS ONE, vol. 10, no. 8, p. e0135798, Aug. 2015, doi: 10.1371/journal.pone.0135798.

[7]

A. C. Olsson et al., “ExposureResponse Analyses of Asbestos and Lung Cancer Subtypes in a Pooled Analysis of CaseControl Studies,” Epidemiology, vol. 28, no. 2, pp. 288–299, Mar. 2017, doi: 10.1097/ede.0000000000000604.

[8]

S. Klebe, J. Leigh, D. W. Henderson, and M. Nurminen, “Asbestos, Smoking and Lung Cancer: An Update,” International Journal of Environmental Research and Public Health, vol. 17, no. 1, p. 258, Dec. 2019, doi: 10.3390/ijerph17010258.

[9]

K. A. Guertin, N. D. Freedman, E. Loftfield, B. I. Graubard, N. E. Caporaso, and R. Sinha, “Coffee consumption and incidence of lung cancer in the NIH-AARP Diet and Health Study,” International Journal of Epidemiology, vol. 45, no. 3, pp. 929–939, Jun. 2015, doi: 10.1093/ije/dyv104.

[10]

V. Galarraga and P. Boffetta, “Coffee Drinking and Risk of Lung CancerA Meta-Analysis,” Cancer Epidemiology, Biomarkers & Prevention, vol. 25, no. 6, pp. 951–957, May 2016, doi: 10.1158/1055-9965.epi-15-0727.

[11]

S. Jin and Y. Je, “Coffee Consumption and Risk of Lung Cancer: A Meta-Analysis of Prospective Cohort Studies,” Nutrition and Cancer, vol. 76, no. 7, pp. 552–562, May 2024, doi: 10.1080/01635581.2024.2348219.

[12]

C. R. Charig, D. R. Webb, S. R. Payne, and J. E. Wickham, “Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy.” BMJ, vol. 292, no. 6524, pp. 879–882, Mar. 1986, doi: 10.1136/bmj.292.6524.879.