Chapter 3 Study Design

In this chapter by Dr. H.G.J. van Mil, statistical considerations of experiments or studies that ensure maximal power and generalizability are explained.


3.1 Introduction

In day 2, we motivated the use of statistics in empirical sciences like biology. To this end, we introduced the random variable (RV), statistical models, degrees of freedom and residuals. I ended my lectures with two notes:

  • The quality of your analysis critically depends on the quality of your data;
  • The structure of a statistical model can be used as input for the design of an experiment.

Both these notes are related to experimental design, a subfield of statistics, linking statistical analysis to the acquisition of data by optimizing the process of collecting data for a study of interest.

Below we start with a discussion on the relation between a research question and the statistical model. Then we discuss how the statistical model can be linked to the structure of your data set or spreadsheet, and consequently to issues of experimental design. The last screencast discusses some leading concepts in experimental design.

Note that one of the assignments that you need to do during your bachelor research project is the analysis of your own experiment from the perspective of the elements of experimental design discussed today.

3.1.1 From Research Question to Statistical Model

Chapter 2 introduced the concept of a statistical model. The screencast below will show that statistical models are actually the formalization of research questions. In case your research question is of a confirmatory nature (i.e. a yes/no question), it can be linked to statistical test using a hypothesis. This concept is also explained in paragraph 7.1 of Introduction to Biostatistics.

In the first step, the research questions translated into a hypothesis, sets the stage for your experimental design and data analysis. We do not discuss the important skills to ask the right scientific questions, we leave that to the context/field of the study, but here the creativity and quality of the researcher might already become apparent.

But let us start the discussion with the somewhat critical even cynical note by sir Ronald Fisher:

“To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.” — Sir Ronald Aylmer Fisher

  • Input: Research question \(\to\) hypothesis \(\to\) statistical model;
  • Input motivates:
    • Identification of relevant objects and variables in the population;
    • Humans … of a certain gender … of a certain age …;
  • Previous steps motivate how to sample the population best;
    • Do I get all ages in my samples;
  • Previous steps motivate how to design my database/data structure;
  • Take into account unknown variables (confounders) that might affect the measurements:
    • Make sure you measure the confounding variable ;
    • Randomize properly to minimize the average effect that a confounder might have;
  • Reproducibility of the data acquisition process:
    • Can my experiment be repeated;
  • The quality of the data depend critically on data acquisition process;
  • The quality of the data will affect the complexity of the statistical analysis.

Also see: Famous easy to understand examples of a confounding variable invalidating a study

3.1.2 Statistical models, Spreadsheet and Experimental Design

As argued previously, the structure of the statistical model is directly related to the structure of the spreadsheet. A well formed spreadsheet for statistical analysis should have variables as columns and individual measurements as as rows. This format is called tidy. A model is therefore a good starting point for the experimental design stage:

  • Knowing how statistics are linked to the model allows you to optimize the experimental design:
    • Not in the sense of fraud but in efficiency and data quality;
  • Knowing the data types is knowing your method of measurement;
  • Knowing your measurement methods gives you insight in their accuracy:
    • Measurement are sources of variance;
    • This variance might not be of our interest but we could compensate for it;
    • Remember: “statistics is about explaining the variance,” all sources of variance.

Next lecture is on experimental design proper.

3.2 Elements of Experimental Design

The aim of experimental design is, to reduce, or take into account, the impact of extraneous variables.

If done correctly, the power and precision of the statistical analysis is increased.

Elements of experimental design discussed are:

  • Control;
  • Randomization;
  • Replication;
  • Blocking;
  • Balanced design;
  • Type I en Type II error and power.

Extraneous variables relate to the model like natural variance and know and unknown variables.

Again, you will be aseked to analyze your own bachelor research in the context of these elements of experimental design. So how can I rationalize my sample size; are my measurements independent; etc.

Today there are several exercises. Two R/Rstudio exercises, where most of the code is still given but you need to answer several questions related to the computation and statistical theory. If you are finished, or at the end of the assignment in the afternoon, you can discuss with each other your Bachelor Research Project experiments from the perspective of experimental design.

Assignments:

  • See Brightspace for Rmd file random variables and working with data;
  • Discuss with each other your Bachelor Research Project in the context of experimental design.