Introduction to the snow vole dataset
For these exercises we will use some published data, freely available from the Dryad repository: Bonnet, T & Postma, E (2018). Data from: Fluctuating selection and its (elusive) evolutionary consequences in a wild rodent population. Dryad Digital Repository. https://datadryad.org//resource/doi:10.5061/dryad.6767m.
For some context, these data are from the long-term monitoring of a population of snow voles (Chionomys nivalis) in the Swiss Alps.
We will use only the file “Phenotypic data” or “YearPheno.txt”.
What exactly the variables are is not crucial, but having a vague idea will help you make sense of the exercises:
- ID: focal individual unique identifier
- animal: duplicate of ID to fit relatedness matrices in animal model
- Mother: mother of the focal individual
- Year: year of measurement
- Sex: Female or male
- Age: Juvenile or adult
- BMI: Body mass index.
- RJst: Standardized relative Julian day
- RJ2st: Squared Standardized relative Julian day
- Phi and PhiZ: survival to the next year (duplicated for historical reasons)
- RhoZ: reproduction on the next year
- FitnessZ: RhoZ + 2Phi. Check the publication for details
- GGImm: proportion of assumed immigrant ancestry used to fit genetic groups.
- BMIst: BMI standardized within years
- BMI1 and BMI2: BMI in years of positive or negative selection.
Exercise 1: Understanding the data
1.1 Load the data
Load YearPheno.txt into R using one line of code, then look at the summary() of the dataset There are several ways to do that, just be sure you get proper column names (if summary returns variable names V1, V2… then you need to change an option in the function you use to load the data)
How many missing values are there for the variable BMI ?
1.2 Visualize the main variables
We are mainly interested in the relationship between BMI (a body mass index) and Phi, RhoZ and FitnessZ (all three measures of Darwinian individual fitness).
Visualize the pairwise-relationships between all four variables. You may use many approaches, have all relationships in one plot or in multiple plots.
Three of the four variables are visibly correlated to each other, which ones?
What type of data are the variables? Continuous, discrete, binary, integer…
What values can FitnessZ take? How does it differ from RhoZ? The functions table() and hist() may be useful here
Exercise 2: Linear models
2.1 BMI, age and sex
Fit linear models to try and answer the following questions:
- Does BMI differ between males and females?
- Does BMI differ between adults and juveniles?
Is the age difference in BMI sex specific? (that is, is there an interaction sex-by-age?)
Check the diagnostic graphes of your models (using plot(lm(…))). What do you see? Why? Is it okay?
- Make a graph representing expected BMI for the four age-sex-classes (Female-J, Female-A, Male-J, Male-A).
Add standard errors onto the graph. (Here you may use one of your linear model to achieve these tasks, or work directly with raw data. Try both!)
2.2 Effect of date
- RJst is a standardized version of date (within a year). How does BMI change through a year?
- Is it the same for all ages and sexes?
- Show the effect(s) of date on BMI graphically.
2.3 Time trend in BMI?
Fit a linear model to see if BMI has changed through years (using the variable Year).
Then add Age and Sex to your model (and optionally their interaction, as well as RJst). What is the effect of Year? How could you interpret the difference with the previous model?
2.4 Assumptions
- What assumptions of linear models may have been violated in all of the previous models? What to do about it? Hint: wikipedia knows what the assumptions of linear models are in case you don’t remember. It may not be expressed in an easy language, but it is very useful to train yourself to read it.
Exercise 3 Mixed models
3.1 Fit a Linear Mixed Model
- Use the package lme4 (or another one you prefer) to fit ID as a random effect in a model of BMI (also include fixed effects: Age, Sex, RJst and appropriate interactions).
- What is the estimated variance among individuals.
- Test the statistical significance of the random effect using a likelihood ratio test.
- What is the individual repeatability of BMI? How does the value change if you include or do not include Sex in the model? Why?
- Estimate confidence intervals for the variance components (the function confint() would work for lme4)
3.2 Crossed-random effects
Exercise 4 Generalized linear models
4.1 From LM to GLM
- Up until now, among our four variables of interest, we have worked only with BMI. Why not to do a linear model with Phi, RhoZ, or FitnessZ?
- Try a linear model with Phi (which is survival) as the response, RJst (which is date) as a predictor. Check the model summary, the model diagnostic, and make a graph of the model prediction… what is wrong?
4.2 Logistic regression
- Use logistic regressions (binomial GLMs) to test the effect of Age on Phi (survival probability).
- Use your models to predict the survival probability of adults and that of juveniles.
- Add BMI to the model. What is the effect of BMI?
- Make a graph of survival probability as a function of BMI.
4.3 RhoZ
- What type of GLM could you use for the variable RhoZ?
- What is the effect of BMI on RhoZ?
Exercise 5 Generalized linear mixed models