Mixed models introduction

This document provide an introduction to linear mixed models through examples and exercises.

An example of why mixed models are useful

We first simulate data that violate an assumption of linear models: independance of residuals.

Data simulation

The following code may appear superficially complicated, but don’t worry about it. This tutorial is about mixed models, not simulations, you can just copy-paste it to generate the data we will need later on.

library(lme4)

## Loading required package: Matrix

set.seed(1234)
x <- 4+sort(rnorm(100)) + rnorm(100,0,0.15)
block <- rep(x = 1:5, each=20)
blockvalues <- c(-2,-1,0,1,2)
y <- 8 - x + blockvalues[block] + rnorm(100,0,0.5)
thorns <- data.frame(response = y, predictor = x, block=block)

Is there an effect of the predictor (let’s pretend Thorn density in a bush) on the response (let’s pretend Herbivory load on the bush)? Graphically, it looks like there is a positive relationship (more thorns means more herbivory!?):

plot(thorns$response, x=thorns$predictor, ylab = "Herbivory load", xlab= "Thorn density")
abline(lm(response~ predictor, data=thorns), lwd=5, col="gray")#this is a shortcut to draw a regression line

The summary of a linear model (simple linear regression) also indicates that strong evidence for a positive relationship between the response and the predictor.

summary(lm(response~ predictor, data=thorns))

## 
## Call:
## lm(formula = response ~ predictor, data = thorns)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.11617 -0.51831  0.08215  0.54349  1.51932 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.2564     0.2854   11.41  < 2e-16 ***
## predictor     0.2524     0.0717    3.52 0.000657 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7252 on 98 degrees of freedom
## Multiple R-squared:  0.1122, Adjusted R-squared:  0.1032 
## F-statistic: 12.39 on 1 and 98 DF,  p-value: 0.0006566

As always, we should check the assumptions of our model to be confident about the parameter estimates and their uncertainty (in particular, their p-values). The main assumptions are: * A Gaussian distribution of the residuals * A constant variance in residuals * Independence of residuals

You can have a look at some diagnostic graphes using plot(lm()):

plot(lm(response~ predictor, data=thorns))

You should notice that the two first assumptions are more or less OK, but that something is wrong with the third one: there are trends in the residuals, so that knowing the value of a residual makes you able to more or less predict where should be the next one. That is a serious warning. But you will not always be that lucky. Often, these graphes would not reveal such strong patterns even when the problem is accute. Instead, you should think about how you collected the data and what may cause the residuals to be non-independent.

Let’s pretend that we collected the data from five different locations (blocks). What if we visualizing differences between locations:

plot(thorns$predictor, thorns$response, col=block, ylab = "Herbivory load", xlab= "Thorn density")
abline(lm(response~ predictor, data=thorns), lwd=5, col="gray")

It looks like there is a negative effect of thorn density within blocks, but that it is hidden by among block differences!

The non-independence of residuals is caused by differences in herbivory among blocks, which hides the real effect of thorns on herbivory. (NB: don’t get confused, in this case the blocks also differ in the predictor (thorn density), but that is not important; the problem is differences in the response.)

How to correct for this problem statistically?

One option would be to fit five regressions, one per-site. However, that would substantially reduce the statistical power, and you would likely not detect the relationship in all the blocks even if it is there in all of them.

Much better, correct for block, either with a fixed factor:

summary(lm(response~ predictor + as.factor(block), data=thorns))

## 
## Call:
## lm(formula = response ~ predictor + as.factor(block), data = thorns)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.68769 -0.28889  0.04982  0.24924  1.39683 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         6.0471     0.3872  15.617  < 2e-16 ***
## predictor          -0.9752     0.1414  -6.899 6.02e-10 ***
## as.factor(block)2   0.9400     0.1762   5.334 6.62e-07 ***
## as.factor(block)3   1.9958     0.2147   9.295 5.80e-15 ***
## as.factor(block)4   2.9706     0.2812  10.562  < 2e-16 ***
## as.factor(block)5   3.7674     0.4219   8.929 3.48e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4875 on 94 degrees of freedom
## Multiple R-squared:  0.6151, Adjusted R-squared:  0.5947 
## F-statistic: 30.05 on 5 and 94 DF,  p-value: < 2.2e-16

or with a random effect:

summary(thornLMM <- lmer(response ~ predictor + (1|block), data=thorns))

## Linear mixed model fit by REML ['lmerMod']
## Formula: response ~ predictor + (1 | block)
##    Data: thorns
## 
## REML criterion at convergence: 165.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4884 -0.6059  0.1091  0.5234  2.8735 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  block    (Intercept) 2.129    1.4590  
##  Residual             0.238    0.4878  
## Number of obs: 100, groups:  block, 5
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   7.7652     0.8441   9.199
## predictor    -0.9189     0.1385  -6.633
## 
## Correlation of Fixed Effects:
##           (Intr)
## predictor -0.632

This is what happens in both cases:

The regression line is vertically shifted among blocks, but all data points contribute to estimating the slope.

A more extreme example, where a random effect is really better than fixed effects

set.seed(1234)
sampsize <- 300
x <- rnorm(sampsize,mean = 4, sd=0.25)
nbblocks <- 50
block <- sample(x = 1:nbblocks, size = sampsize, replace = TRUE)
blockvalues <- rnorm(n = nbblocks, mean = 0, sd = 3)
y <- 8 - x + blockvalues[block] + rnorm(sampsize,0,1)
thornsmany <- data.frame(response = y, predictor = x, block=block)

The true effect of the predictor on the response is -1.

summary(lm(response ~ predictor, data=thornsmany))

## 
## Call:
## lm(formula = response ~ predictor, data = thornsmany)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.5550 -2.9980  0.1568  2.3446  9.9684 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.7448     3.4909   3.651 0.000309 ***
## predictor    -2.0480     0.8703  -2.353 0.019263 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.779 on 298 degrees of freedom
## Multiple R-squared:  0.01824,    Adjusted R-squared:  0.01495 
## F-statistic: 5.538 on 1 and 298 DF,  p-value: 0.01926

A simple linear regression greatly over-estimate the effect, and the standard error is very large.

Compare the linear model with block as a factor, and the mixed model with block as a random effect:

summary(lm(response ~ predictor + as.factor(block), data=thornsmany))

summary(lmer(response ~ predictor + (1|block), data=thornsmany))

They both provide quite a good estimate for the effect of predictor with rather small standard errors (actually the estimates and standard errors should be almost identical). However, the linear model is a mess, there are many parameters we don’t care about! These blocks are different for many reasons we don’t have any idea about. Does it matter if block 4 is significantly different from block 1 but block 5 isn’t? Probably not. The mixed model returns only one variance component for block, instead of 49 estimates for the different levels of the factor. The variance component is a simple measure of how much blocks differ.

Testing random effect significance

Likelihood Ratio Test. Comparison of two nested models. Ratio of likelihood \(\sim \chi^2\)

thornLMM <- lmer(response ~ predictor + (1|block), data = thorns)
thornLM <- lm(response ~ predictor, data = thorns)
anova(thornLMM, thornLM) # the mixed model MUST GO FIRST

## refitting model(s) with ML (instead of REML)

## Data: thorns
## Models:
## thornLM: response ~ predictor
## thornLMM: response ~ predictor + (1 | block)
##          Df    AIC    BIC   logLik deviance  Chisq Chi Df Pr(>Chisq)    
## thornLM   3 223.50 231.31 -108.749   217.50                             
## thornLMM  4 172.05 182.47  -82.027   164.05 53.445      1   2.66e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

One simulation with a random effect that has no effect

set.seed(1234)
RandomVariance <- 0
sampsize <- 200
x <- rnorm(sampsize,mean = 4, sd=0.25)
nbblocks <- 50
block <- sample(x = 1:nbblocks, size = sampsize, replace = TRUE)
blockvalues <- rnorm(n = nbblocks, mean = 0, sd = sqrt(RandomVariance))
y <- 8 - x + blockvalues[block] + rnorm(sampsize,0,1)
dat <- data.frame(response = y, predictor = x, block=block)

lm0 <- lm(response ~ 1 + predictor, data=dat)
lmm0 <- lmer(response ~ 1 + predictor + (1|block), data=dat )
(LRT0 <- anova(lmm0, lm0)) #mixed model must come first!
LRT0$`Pr(>Chisq)`[2] # the p-value

Exercise: the p-value is wrong for the LRT

Replicate the simulations to obtain the distribution of p-values under the null-model of no variance

There are too few significant results. It may seem good that there are fewer false positive than expected, but it is really not: this means that the p-value doesn’t have the meaning it should.

The problem is that a variance cannot be negative. You can check the confidence interval from our models:

confint(lmm0) #Confidence interval

LRT are two sided tests / count one parameter per random effect A random effect is half a parameter / to be tested with one-side tests

Divide the p-values by two

Same problem with AIC/BIC: count only half a parameter per random effect Remove one IC point per random effect

NB: it is more complicated with random interactions; but the rule is to count half a parameter by variance parameter