Today we learn:
Thinking about the structure of residual variance is an integral part of critical thinking:
Random effects are a powerful way to capture unexplained patterns and build more robust inference
How to interpret mixed effect model output
How to visualise mixed effect model results
How to test the significance of random effects
Packages for today:
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.1.2 ✓ dplyr 1.0.6
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(glmmTMB)
library(emmeans)
library(DHARMa)
## This is DHARMa 0.4.1. For overview type '?DHARMa'. For recent changes, type news(package = 'DHARMa') Note: Syntax of plotResiduals has changed in 0.3.0, see ?plotResiduals for details
Data for today:
download.file(url = "https://timotheenivalis.github.io/workshops/Stats21/Data.zip",
destfile = "Data.zip")
unzip(zipfile = "Data.zip")
General guidelines:
Define your scientific question(s)
Visualise your data in a way that answers your question
Fit corresponding model to put numbers on what you see
Check assumptions and think about how much you trust your results
Last week we found that different genotypes responded differently to water conditions, that is, there was an interaction between genotype and water condition.
(NB: I have tweaked the dataset a little to make the patterns a bit clearer; so don’t worry if you see slight changes in the plots compared to last week!)
drought_data <- read_csv("Data/droughtdata2.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## plant = col_double(),
## Genotype = col_character(),
## WaterCondition = col_character(),
## Temperature = col_double()
## )
ggplot(drought_data, aes(x = WaterCondition, y= Temperature, color=Genotype)) +
geom_boxplot()
m_inter <- glmmTMB(Temperature ~ Genotype * WaterCondition, data = drought_data)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
summary(m_inter)
## Family: gaussian ( identity )
## Formula: Temperature ~ Genotype * WaterCondition
## Data: drought_data
##
## AIC BIC logLik deviance df.resid
## 141.5 148.8 -65.7 131.5 27
##
##
## Dispersion estimate for gaussian family (sigma^2): 3.56
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 32.3875 0.6672 48.54 < 2e-16 ***
## GenotypeWT 6.0375 0.9435 6.40 1.57e-10 ***
## WaterConditionNormal -0.2250 0.9435 -0.24 0.812
## GenotypeWT:WaterConditionNormal -6.3125 1.3343 -4.73 2.24e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
inter_prediction <- as_tibble(emmeans(m_inter, ~ WaterCondition * Genotype))
ggplot(drought_data, aes(x = WaterCondition, y= Temperature, color=Genotype)) +
geom_jitter(alpha=0.3, width=0.1, height = 0) +
geom_point(data=inter_prediction, aes(y=emmean), size=3, position = position_dodge(width = 1)) +
geom_errorbar(data=inter_prediction,
aes(ymin=lower.CL, ymax=upper.CL,x=WaterCondition, color=Genotype),
width=0.1, inherit.aes = FALSE, position = position_dodge(width = 1))
We ignored one thing though: temperature was measured on leaves from the same plants. For each treatment we have 8 data points but have only use two plants. This exposes us to a potential problem:
ggplot(drought_data, aes(x=interaction(Genotype, WaterCondition),
y=Temperature, color=as.factor(plant)))+
geom_point()+
xlab("Genotype-by-exposure") + scale_color_brewer(palette="Dark2")
We do not really know what properties of the plants (apart from genotype and watering condition) may produce differences in temperature, and we have not measured much else about them. That is a typical case when to use a random effect to model individual differences.
A random intercept effect of plant asks the model to partition the residual variance into differences among plants and differences within plants. If plants are different from each other on average then we need to account for it to obtain a result that we may generalise to other plants.
m_inter_RE <- glmmTMB(Temperature ~ 1 + Genotype*WaterCondition + (1|plant), data = drought_data)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
summary(m_inter_RE)
## Family: gaussian ( identity )
## Formula: Temperature ~ 1 + Genotype * WaterCondition + (1 | plant)
## Data: drought_data
##
## AIC BIC logLik deviance df.resid
## 107.0 115.8 -47.5 95.0 26
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## plant (Intercept) 3.0509 1.7467
## Residual 0.5101 0.7142
## Number of obs: 32, groups: plant, 8
##
## Dispersion estimate for gaussian family (sigma^2): 0.51
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 32.388 1.261 25.691 < 2e-16 ***
## GenotypeWT 6.037 1.783 3.387 0.000708 ***
## WaterConditionNormal -0.225 1.783 -0.126 0.899569
## GenotypeWT:WaterConditionNormal -6.312 2.521 -2.504 0.012290 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Check the model correctly captured differences in average plant temperatures by looking at estimated plant deviations (they are called BLUPs = Best Linear Unbiased Predictions):
ranef(m_inter_RE)$cond$plant
The interaction is still apparent but there is much more uncertainty about its strength: the SE and p-value are much larger. The variance captured by plant is also very large: 1.75, vs. 0.71 for the residual. On average there is more differences among plants mean temperatures than within plant measurements.
Let’s visualise the different predictions made by the models with and without random effects (the code is not great, don’t worry to much about it, the point is just to create a plot to illustrate the statistical patter):
inter_RE_prediction <- as_tibble(emmeans(m_inter_RE, ~ WaterCondition * Genotype))
inter_RE_prediction$RandomEffect <- TRUE
inter_prediction$RandomEffect <- FALSE
predictionsRE <- rbind(inter_prediction, inter_RE_prediction)
ggplot(drought_data, aes(x=interaction(Genotype, WaterCondition),
y=Temperature, color=as.factor(plant))) +
geom_jitter(alpha=0.8, width=0.1, height = 0) +
geom_point(data=predictionsRE, aes(y=emmean,x=interaction(Genotype, WaterCondition), stre=RandomEffect),
size=3, inherit.aes = FALSE, position = position_dodge(width = 1))+
geom_errorbar(data=predictionsRE,
aes(ymin=lower.CL, ymax=upper.CL, x=interaction(Genotype, WaterCondition), stre=RandomEffect),
width=0.1, inherit.aes = FALSE, position = position_dodge(width = 1)) +
xlab("Genotype-by-exposure") + scale_color_brewer(palette="Dark2")
## Warning: Ignoring unknown aesthetics: stre
## Warning: Ignoring unknown aesthetics: stre
Here including a random effect changes very little the predicted means, but changes considerably the error bars and the p-value. We still find evidence of an interaction but we have much less confidence about how strong it is. That makes sense, after all our experiment considers only 2 plants per treatment! It is very possible these two plants are not representative!
To convince ourselves the model captures variation well, we can also plot plant specific predictions, but we need the predict function for that:
newdatadrought <- drought_data[!duplicated(drought_data$plant),]
#make a prediction for each random effect level ("conditional"):
preoutput <- predict(m_inter_RE, newdata =newdatadrought,
se.fit = TRUE, re.form = NULL)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
newdatadrought$Temperature <- preoutput$fit
newdatadrought$lower.CL <- preoutput$fit - 2*preoutput$se.fit
newdatadrought$upper.CL <- preoutput$fit + 2*preoutput$se.fit
#reproduce the emmeans result by averaging across the random effect level ("marginal"):
predict(m_inter_RE, newdata = expand_grid(WaterCondition=unique(drought_data$WaterCondition),
Genotype=unique(drought_data$Genotype),
plant=NA), se.fit = TRUE, re.form = NA)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## $fit
## mu_predict mu_predict mu_predict mu_predict
## 31.8875 32.1625 38.4250 32.3875
##
## $se.fit
## mu_predict mu_predict mu_predict mu_predict
## 1.260634 1.260634 1.260634 1.260634
ggplot(drought_data, aes(x=interaction(Genotype, WaterCondition),
y=Temperature, color=as.factor(plant))) +
geom_jitter(alpha=0.8, width=0.1, height = 0) +
geom_point(data=predictionsRE, aes(y=emmean,x=interaction(Genotype, WaterCondition), stre=RandomEffect),
size=3, inherit.aes = FALSE, position = position_dodge(width = 1))+
geom_errorbar(data=predictionsRE,
aes(ymin=lower.CL, ymax=upper.CL, x=interaction(Genotype, WaterCondition), stre=RandomEffect),
width=0.1, inherit.aes = FALSE, position = position_dodge(width = 1)) +
geom_point(data=newdatadrought, shape=17, size=3, aes(x=interaction(Genotype, WaterCondition),
y=Temperature, color=as.factor(plant)), inherit.aes=FALSE)+
geom_errorbar(data=newdatadrought, aes(x=interaction(Genotype, WaterCondition),
ymin=lower.CL, ymax=upper.CL, color=as.factor(plant)), inherit.aes=FALSE, width=0.1)+
xlab("Genotype-by-exposure") + scale_color_brewer(palette="Dark2")
## Warning: Ignoring unknown aesthetics: stre
## Warning: Ignoring unknown aesthetics: stre
Including a random effect shows us that we are very confident about the average value for each plant, but not for the average value for a broader unobserved population.
By the way, this final model shows some small problems with residual distributions:
plot(simulateResiduals(m_inter_RE))
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
It may be that there is more variation in drought conditions than in normal conditions. Perhaps we could still improve the model by accounting for unequal variances, but given our small sample size that would probably be difficult. Here I think the next thing to do would rather be to collect more data, across more plants.
Imagine you have measured how much a plant species is attacked by big herbivores as a function of how many thorns the plant grows on stems. You have collected data in 5 locations and visualise the relationship between herbivory and quantity of thorns
thorndata <- read_csv("Data/thorndata.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## herbivory = col_double(),
## thorndensity = col_double(),
## site = col_character()
## )
ggplot(aes(x=thorndensity, y= herbivory), data = thorndata) + geom_point()+
geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'
Counter-intuitive relationship.
m0 <- glmmTMB(herbivory ~thorndensity , data=thorndata)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
summary(m0)
## Family: gaussian ( identity )
## Formula: herbivory ~ thorndensity
## Data: thorndata
##
## AIC BIC logLik deviance df.resid
## 223.5 231.3 -108.7 217.5 97
##
##
## Dispersion estimate for gaussian family (sigma^2): 0.515
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.25642 0.28249 11.528 < 2e-16 ***
## thorndensity 0.25237 0.07098 3.556 0.000377 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Counter-intuitive result is confirmed.
Residual checks:
plot(simulateResiduals(m0))
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
No significant deviation, but patterns apparent in residuals.
It turns out, the data were collected across five locations!
ggplot(aes(x=thorndensity, y= herbivory, color=site), data = thorndata) + geom_point()+
geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'
There is a very strong pattern of differences among sites in both the predictor and the response. We don’t really know why, we don’t really care about the sites (we don’t have hypotheses about why sites differ), so we can probably model site effectively with a random effect:
mm1 <- glmmTMB(herbivory ~ thorndensity + (1|site), data=thorndata)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
summary(mm1)
## Family: gaussian ( identity )
## Formula: herbivory ~ thorndensity + (1 | site)
## Data: thorndata
##
## AIC BIC logLik deviance df.resid
## 172.1 182.5 -82.0 164.1 96
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## site (Intercept) 1.6477 1.2836
## Residual 0.2358 0.4856
## Number of obs: 100, groups: site, 5
##
## Dispersion estimate for gaussian family (sigma^2): 0.236
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.7080 0.8014 9.618 < 2e-16 ***
## thorndensity -0.9041 0.1447 -6.247 4.2e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now the relationship makes sense!
Let’s visualise model predictions:
pred_mm1 <- as_tibble(
emmeans(
mm1, ~ thorndensity, at=list(
thorndensity=seq(min(thorndata$thorndensity), max(thorndata$thorndensity), length.out=100)
)
)
)
ggplot(aes(x=thorndensity, y= herbivory, color=site), data = thorndata) +
geom_point()+
geom_line(data = pred_mm1, aes(y=emmean, x=thorndensity), inherit.aes = FALSE)+
geom_ribbon(data = pred_mm1, aes(ymin=lower.CL, ymax=upper.CL, x=thorndensity), inherit.aes = FALSE, alpha=0.1)
emmeans is not great with random effects because it can show only the relationship across sites (as if we did not know where we observe the data). The uncertainty includes variance among random effects.
To work with mixed models predict() may be better, because we can see the relationship within each site:
newdata <- expand_grid(thorndensity = seq(min(thorndata$thorndensity), max(thorndata$thorndensity), length.out=100),
site=unique(thorndata$site))
predoutput <- predict(mm1, newdata = newdata, se.fit = TRUE)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
newdata$herbivory <- predoutput$fit
newdata$lower.CL <- predoutput$fit - 2*predoutput$se.fit
newdata$upper.CL <- predoutput$fit + 2*predoutput$se.fit
ggplot(aes(x=thorndensity, y= herbivory, color=site), data = thorndata) +
geom_point()+
geom_line(data=newdata) +
geom_ribbon(data = newdata, aes(x=thorndensity, ymin=lower.CL, ymax=upper.CL, fill=site), inherit.aes = FALSE, alpha=0.1)
Snow vole (that is a rodent) data. What explains variation in body mass?
voles <- read_csv("Data/voles.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## id = col_character(),
## mass = col_double(),
## body_length = col_double(),
## tail_length = col_double(),
## sex = col_character(),
## age = col_character(),
## year = col_double(),
## x_coord = col_double(),
## y_coord = col_double(),
## day = col_double(),
## inbreeding = col_double()
## )
voles <- na.omit(voles)
Last week we got this model:
m_vole1 <- glmmTMB(mass ~ 1 + body_length + age, data=voles)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
summary(m_vole1)
## Family: gaussian ( identity )
## Formula: mass ~ 1 + body_length + age
## Data: voles
##
## AIC BIC logLik deviance df.resid
## 12461.7 12484.6 -6226.9 12453.7 2215
##
##
## Dispersion estimate for gaussian family (sigma^2): 16
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -27.64520 1.62558 -17.01 <2e-16 ***
## body_length 0.59794 0.01393 42.92 <2e-16 ***
## ageJ -5.06032 0.30768 -16.45 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pred_voles1 <- as_tibble(emmeans(m_vole1, spec = ~ age + body_length,
at=list(body_length=seq(min(voles$body_length, na.rm = TRUE),
max(voles$body_length, na.rm = TRUE)))))
pred_voles1 <- pred_voles1 %>% filter((age=="A" & body_length>100 ) | (age=="J" & body_length<120) )
ggplot(voles, aes(x=body_length, y=mass, color=age)) + geom_point() +
geom_line(data = pred_voles1, aes(y=emmean), size=1) +
geom_ribbon(data = pred_voles1, aes(ymin=lower.CL, ymax=upper.CL, x=body_length, fill=age),
inherit.aes = FALSE, alpha=0.3)
That model is maybe not reliable because it omits several sources of non-independence between residuals. In particular we measured the same individuals repeatedly and we measured individuals across several years.
voles$resid1 <- residuals(m_vole1)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
voles %>%
group_by(year) %>%
summarise(mean(resid1, na.rm=TRUE))
voles %>%
group_by(id) %>%
summarise(indmean=mean(resid1, na.rm=TRUE)) %>%
ggplot(aes(x=indmean)) + geom_density()
voles %>%
group_by(id) %>%
summarise(indmean=mean(resid1, na.rm=TRUE), yearmean=mean(year)) %>%
ggplot(aes(x=yearmean, y=indmean)) + geom_point()
We can try to account for those sources of structure with random effects:
m_voles2 <- glmmTMB(mass ~ 1 + body_length + age + (1|id) + (1|year), data=voles)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
summary(m_voles2)
## Family: gaussian ( identity )
## Formula: mass ~ 1 + body_length + age + (1 | id) + (1 | year)
## Data: voles
##
## AIC BIC logLik deviance df.resid
## 12108.8 12143.0 -6048.4 12096.8 2213
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## id (Intercept) 4.2607 2.0641
## year (Intercept) 0.4321 0.6573
## Residual 10.9173 3.3041
## Number of obs: 2219, groups: id, 677; year, 7
##
## Dispersion estimate for gaussian family (sigma^2): 10.9
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -19.64876 1.65228 -11.89 <2e-16 ***
## body_length 0.52935 0.01404 37.70 <2e-16 ***
## ageJ -6.25393 0.31801 -19.67 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The parameter estimates and standard errors changed very little, but the variance components seem quite large, so we are capturing residual variation.
Does it mean that on some years voles were on average heavier (given their length)? And that some individuals are consistently heavier through measurements?
To test these ideas we need to learn new tools to work with uncertainty in random effect parameters.
Likelihood Ratio Test (LRT): comparing a model with and one without the random effect of interest:
m_voles2_minusYear <- glmmTMB(mass ~ 1 + body_length + age + (1|id), data=voles)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
anova(m_voles2, m_voles2_minusYear)
m_voles2_minusId <- glmmTMB(mass ~ 1 + body_length + age + (1|year), data=voles)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
anova(m_voles2, m_voles2_minusId)
Year effects
newdata2 <- expand_grid(body_length=seq(min(voles$body_length, na.rm = TRUE),
max(voles$body_length, na.rm = TRUE)), age=c("A", "J"),
year=unique(voles$year), id=voles$id[1])
predoutput <- predict(m_voles2, newdata = newdata2, se.fit = TRUE, re.form = NULL)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
newdata2$mass <- predoutput$fit
newdata2$lower.CL <- predoutput$fit - 2* predoutput$se.fit
newdata2$upper.CL <- predoutput$fit
newdata2 <- newdata2 %>% filter((age=="A" & body_length>100 ) | (age=="J" & body_length<120) )
ggplot(voles, aes(x=body_length, y=mass, color=as.factor(year)) ) + geom_point() +
geom_line(data = newdata2, aes(x=body_length, color=as.factor(year)) ) + facet_wrap(~age, scales = "free") #+
#geom_ribbon(data = newdata2, aes(ymin=lower.CL, ymax=upper.CL, x=body_length, fill=as.factor(year)),
# inherit.aes = FALSE, alpha=0.1)
newdata2b <- expand_grid(body_length=seq(min(voles$body_length, na.rm = TRUE),
max(voles$body_length, na.rm = TRUE)), age=c("A", "J"),
year=voles$year[1], id=unique(voles$id))
newdata2b$mass <- predict(m_voles2, newdata = newdata2b, re.form = NULL)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
newdata2b <- newdata2b %>% filter((age=="A" & body_length>100 ) | (age=="J" & body_length<120) )
ggplot(voles, aes(x=body_length, y=mass) ) + geom_point() +
geom_line(data = newdata2b, aes(x=body_length, color=id), alpha=0.1 ) +
facet_wrap(~age, scales = "free") +
theme(legend.position = "none")
Does sex-difference varies among years?
ggplot(voles, aes(x=body_length, y=mass, color=interaction(age,sex)) ) +
geom_point(alpha=0.1)+
geom_smooth(method = "lm") +
facet_wrap(~year)
## `geom_smooth()` using formula 'y ~ x'
It looks pretty consistent.
m_voles3 <- glmmTMB(mass ~ 1 + body_length + sex + age + (1|id) + (1|year), data=voles)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
summary(m_voles3)
## Family: gaussian ( identity )
## Formula: mass ~ 1 + body_length + sex + age + (1 | id) + (1 | year)
## Data: voles
##
## AIC BIC logLik deviance df.resid
## 12074.2 12114.2 -6030.1 12060.2 2212
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## id (Intercept) 3.7521 1.937
## year (Intercept) 0.4693 0.685
## Residual 10.9278 3.306
## Number of obs: 2219, groups: id, 677; year, 7
##
## Dispersion estimate for gaussian family (sigma^2): 10.9
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -19.26855 1.63188 -11.81 < 2e-16 ***
## body_length 0.52057 0.01394 37.34 < 2e-16 ***
## sexMale 1.40825 0.22843 6.17 7.05e-10 ***
## ageJ -6.55243 0.31763 -20.63 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m_voles4 <- glmmTMB(mass ~ 1 + body_length + sex + age + (1|id) + (1 + sex|year), data=voles)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in fitTMB(TMBStruc): Model convergence problem; non-positive-definite
## Hessian matrix. See vignette('troubleshooting')
summary(m_voles4)
## Family: gaussian ( identity )
## Formula:
## mass ~ 1 + body_length + sex + age + (1 | id) + (1 + sex | year)
## Data: voles
##
## AIC BIC logLik deviance df.resid
## NA NA NA NA 2210
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev. Corr
## id (Intercept) 3.759775 1.93901
## year (Intercept) 0.525332 0.72480
## sexMale 0.007707 0.08779 -1.00
## Residual 10.922850 3.30497
## Number of obs: 2219, groups: id, 677; year, 7
##
## Dispersion estimate for gaussian family (sigma^2): 10.9
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -19.23745 1.63562 -11.76 < 2e-16 ***
## body_length 0.52031 0.01395 37.30 < 2e-16 ***
## sexMale 1.40432 0.23125 6.07 1.26e-09 ***
## ageJ -6.56908 0.31951 -20.56 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m_voles3, m_voles4)
What about the age difference?
m_voles5 <- glmmTMB(mass ~ 1 + body_length + age + (1|id) + (1 + age|year), data=voles)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
summary(m_voles5)
## Family: gaussian ( identity )
## Formula: mass ~ 1 + body_length + age + (1 | id) + (1 + age | year)
## Data: voles
##
## AIC BIC logLik deviance df.resid
## 12098.9 12144.6 -6041.5 12082.9 2211
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev. Corr
## id (Intercept) 4.2942 2.0722
## year (Intercept) 0.7571 0.8701
## ageJ 1.0087 1.0043 -0.69
## Residual 10.7667 3.2813
## Number of obs: 2219, groups: id, 677; year, 7
##
## Dispersion estimate for gaussian family (sigma^2): 10.8
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -19.03344 1.66968 -11.40 <2e-16 ***
## body_length 0.52375 0.01408 37.20 <2e-16 ***
## ageJ -6.40091 0.50600 -12.65 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m_voles5, m_voles2)
Let’s try to visualise that:
newdata5 <- expand_grid(body_length = seq(70, 135), age=c("A", "J"),
year=unique(voles$year), id=voles$id[1])
pred5_out <- predict(m_voles5, newdata = newdata5, se.fit = TRUE)
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
## Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j =
## integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for you
newdata5$mass <- pred5_out$fit
ggplot(voles, aes(x=body_length, y=mass, color=age) ) +
geom_point(alpha=0.4)+
geom_line(data=newdata5, mapping = aes(x=body_length, y=mass, color=age, by=as.factor(year)) )
## Warning: Ignoring unknown aesthetics: by
ggplot(voles, aes(x=body_length, y=mass) ) +
geom_point(alpha=0.4)+
geom_line(data=newdata5, mapping = aes(x=body_length, y=mass, color=as.factor(year)),
inherit.aes = FALSE) + facet_wrap(~age)
REyears <- ranef(m_voles5)$cond$year
REyears$totalyears <- REyears$ageJ+REyears$`(Intercept)`
For instance, if you have 2 fixed effect predictors and 1 random intercept effect, a simple linear mixed model can be written as:
Response = Intercept + Estimate1 × Predictor1+ Estimate2 × Predictor2 + Random deviation+ Error, with Random deviation and Error distributed both following a random normal distribution.
or more mathematically as:
\[ y_i = \mu + \beta_1 x_{1,i} + \beta_2 x_{2,i} + u_i + \epsilon_i \text{ , with } u \sim N(0, V_U) \text{ and } \epsilon \sim N(0, V_R) \]
You can fit that model as:
glmmTMB(response ~ 1 + predictor1 + predictor2 + (1|randomeffect), data=data)
or equivalently as:
glmmTMB(response ~ predictor1 + predictor2 + (1|randomeffect), data=data)
or equivalently as:
glmmTMB(response ~ predictor2 + (1|randomeffect) + predictor1, data=data)
Syntax rules:
(1|...)
More on random effect structures: http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#model-specification