Today:
Packages:
library(ggplot2)
library(performance)
library(DHARMa)
library(glmmTMB)
## Registered S3 method overwritten by 'glmmTMB':
## method from
## refit.glmmTMB DHARMa
Data:
download.file("https://timotheenivalis.github.io/data/reproduction.csv",
destfile = "data/reproduction.csv")
download.file("https://timotheenivalis.github.io/data/voles.csv",
destfile = "data/voles.csv")
Ideal count data produced by “Poisson” process. The Poisson distribution has a single parameter: the expected count, lambda.
randomcount <- rpois(n = 10000, lambda = 3.2)
mean(randomcount)
## [1] 3.1782
var(randomcount)
## [1] 3.136758
range(randomcount)
## [1] 0 11
summary(randomcount)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.000 3.000 3.178 4.000 11.000
randomcount <- rpois(n = 10000, lambda = 10.8)
mean(randomcount)
## [1] 10.7749
var(randomcount)
## [1] 10.84571
range(randomcount)
## [1] 1 24
summary(randomcount)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 8.00 11.00 10.77 13.00 24.00
ggplot(data.frame(x=randomcount), aes(x=x))+
geom_histogram(binwidth = 0.5,aes(y=stat(count) / sum(count)))
The mean of a true Poisson variable equals its variance. More in general in count variables that are not perfect Poisson, the variance is a function of the mean.
x <- seq(from = 1, to=10, by=1)
dat <- data.frame(x= rep(x, each=1000),
y=as.vector(sapply(x, function(x) {rpois(n = 1000, lambda = x)})))
ggplot(dat, aes(x=as.factor(x), y=y)) + geom_boxplot()
The three components of a glm:
The simplest solution is:
glm(obs ~ 1 + x , family = "poisson", data=data)
However you should NEVER use this. glm( family=“poisson”) assumes there is no unexplained variation apart from the incompressible Poisson random process, that is, there is no unexplained variation in expected counts, or \(V(\mathrm{exp}(Y)) = E(\mathrm{exp}(Y))\). This is almost never the case with biological data. Most of the time there is lots of unexplained variance and are over-dispersed. On the other hand, some biological processes are “conservative”/under-dispersed and have \(V(\mathrm{exp}(y)) < E(\mathrm{exp}(y))\) (for instance bird clutch size within species).
When you use simple Poisson glm on over-dispersed data you artificially decrease SE and p-value: you are going to find lots of false positive. When you use simple Poisson glm on under-dispersed data you artificially increase SE and p-value: you loose statistical power.
There are at least three options to obtain proper inference/prediction with count data:
repro <- read.csv("data/reproduction.csv")
str(repro)
## 'data.frame': 300 obs. of 3 variables:
## $ reproduction: int 0 0 3 1 0 3 1 10 0 3 ...
## $ size : num 1.59 4.55 6.17 2.69 4.86 ...
## $ sex : int 0 1 0 0 0 1 0 1 0 0 ...
summary(repro)
## reproduction size sex
## Min. : 0.00 Min. : 1.027 Min. :0.0000
## 1st Qu.: 0.00 1st Qu.: 2.735 1st Qu.:0.0000
## Median : 1.00 Median : 4.001 Median :1.0000
## Mean : 3.62 Mean : 4.139 Mean :0.5033
## 3rd Qu.: 3.00 3rd Qu.: 5.279 3rd Qu.:1.0000
## Max. :169.00 Max. :10.088 Max. :1.0000
mean(repro$reproduction)
## [1] 3.62
var(repro$reproduction)
## [1] 134.2565
ggplot(repro, aes(x=reproduction)) + geom_density()
The variance is larger than the mean; data are currently over-dispersed (although it is possible that data are not over-dispersed conditional on predictors.)
normdens <- data.frame(x=seq(-10,max(repro$reproduction), by = 0.1),
y=dnorm(x=seq(-10, max(repro$reproduction), by = 0.1),
mean = mean(repro$reproduction),
sd = sd(repro$reproduction)))
ggplot(repro, aes(x=reproduction)) + geom_histogram(aes(y=stat(count) / sum(count))) +
geom_line(data=normdens, aes(x = x, y=y), inherit.aes = FALSE, color="red")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
lmrepro <- lm(reproduction ~ size, data=repro)
summary(lmrepro)
##
## Call:
## lm(formula = reproduction ~ size, data = repro)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.314 -3.439 -1.120 1.013 158.288
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.0484 1.5900 -2.546 0.0114 *
## size 1.8526 0.3515 5.270 2.62e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.1 on 298 degrees of freedom
## Multiple R-squared: 0.08526, Adjusted R-squared: 0.08219
## F-statistic: 27.77 on 1 and 298 DF, p-value: 2.62e-07
glmrepro <- glm(reproduction ~ size, data = repro, family = "quasipoisson")
summary(glmrepro)
##
## Call:
## glm(formula = reproduction ~ size, family = "quasipoisson", data = repro)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -5.5644 -1.8016 -1.1823 0.1392 23.4704
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.83080 0.31504 -2.637 0.0088 **
## size 0.42785 0.05075 8.431 1.49e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 12.3918)
##
## Null deviance: 2892.0 on 299 degrees of freedom
## Residual deviance: 2063.5 on 298 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 6
ggplot(repro, aes(x=size, y=reproduction)) + geom_point()+
geom_smooth(method="lm",fullrange=TRUE, color="orange", fill="orange", alpha=0.2) +
geom_smooth(method="glm", method.args=list(family="poisson"), color="red", fill="red", fullrange=TRUE, alpha=0.2) +
geom_smooth(method="glm", method.args=list(family="quasipoisson"), color="blue", fill="blue", fullrange=TRUE, alpha=0.2) +
ylim(c(-1,30)) + xlim(c(0,max(repro$size)))
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 3 rows containing non-finite values (stat_smooth).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 3 rows containing non-finite values (stat_smooth).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 3 rows containing non-finite values (stat_smooth).
## Warning: Removed 3 rows containing missing values (geom_point).
Let’s build the prediction line by hand to understand the back-transformation.
sizex <- seq(from=0, to=10, by=0.1)
reprolatent <- coef(glmrepro)[1] + sizex*coef(glmrepro)[2]
#equivalent to:
predict(glmrepro, type = "link", newdata = data.frame(size=sizex))
## 1 2 3 4 5 6
## -0.83080074 -0.78801586 -0.74523099 -0.70244612 -0.65966124 -0.61687637
## 7 8 9 10 11 12
## -0.57409150 -0.53130663 -0.48852175 -0.44573688 -0.40295201 -0.36016713
## 13 14 15 16 17 18
## -0.31738226 -0.27459739 -0.23181252 -0.18902764 -0.14624277 -0.10345790
## 19 20 21 22 23 24
## -0.06067302 -0.01788815 0.02489672 0.06768160 0.11046647 0.15325134
## 25 26 27 28 29 30
## 0.19603621 0.23882109 0.28160596 0.32439083 0.36717571 0.40996058
## 31 32 33 34 35 36
## 0.45274545 0.49553033 0.53831520 0.58110007 0.62388494 0.66666982
## 37 38 39 40 41 42
## 0.70945469 0.75223956 0.79502444 0.83780931 0.88059418 0.92337905
## 43 44 45 46 47 48
## 0.96616393 1.00894880 1.05173367 1.09451855 1.13730342 1.18008829
## 49 50 51 52 53 54
## 1.22287317 1.26565804 1.30844291 1.35122778 1.39401266 1.43679753
## 55 56 57 58 59 60
## 1.47958240 1.52236728 1.56515215 1.60793702 1.65072189 1.69350677
## 61 62 63 64 65 66
## 1.73629164 1.77907651 1.82186139 1.86464626 1.90743113 1.95021601
## 67 68 69 70 71 72
## 1.99300088 2.03578575 2.07857062 2.12135550 2.16414037 2.20692524
## 73 74 75 76 77 78
## 2.24971012 2.29249499 2.33527986 2.37806473 2.42084961 2.46363448
## 79 80 81 82 83 84
## 2.50641935 2.54920423 2.59198910 2.63477397 2.67755885 2.72034372
## 85 86 87 88 89 90
## 2.76312859 2.80591346 2.84869834 2.89148321 2.93426808 2.97705296
## 91 92 93 94 95 96
## 3.01983783 3.06262270 3.10540757 3.14819245 3.19097732 3.23376219
## 97 98 99 100 101
## 3.27654707 3.31933194 3.36211681 3.40490169 3.44768656
pred1 <- data.frame(sizex=sizex, reprolatent=reprolatent)
ggplot(repro, aes(x=size, y=reproduction)) + geom_point() +
geom_line(data = pred1, aes(x=sizex, y=reprolatent), color="red")
pred1$reprodata <- exp(pred1$reprolatent)
#equivalent to:
predict(glmrepro, type = "response", newdata = data.frame(size=sizex))
## 1 2 3 4 5 6 7
## 0.4357003 0.4547462 0.4746247 0.4953721 0.5170265 0.5396274 0.5632163
## 8 9 10 11 12 13 14
## 0.5878364 0.6135327 0.6403522 0.6683442 0.6975597 0.7280524 0.7598780
## 15 16 17 18 19 20 21
## 0.7930948 0.8277636 0.8639479 0.9017140 0.9411309 0.9822709 1.0252092
## 22 23 24 25 26 27 28
## 1.0700246 1.1167989 1.1656179 1.2165710 1.2697513 1.3252564 1.3831878
## 29 30 31 32 33 34 35
## 1.4436516 1.5067584 1.5726238 1.6413685 1.7131182 1.7880043 1.8661639
## 36 37 38 39 40 41 42
## 1.9477402 2.0328824 2.1217465 2.2144951 2.3112981 2.4123326 2.5177838
## 43 44 45 46 47 48 49
## 2.6278445 2.7427164 2.8626096 2.9877439 3.1183481 3.2546616 3.3969337
## 50 51 52 53 54 55 56
## 3.5454250 3.7004074 3.8621645 4.0309926 4.2072008 4.3911116 4.5830617
## 57 58 59 60 61 62 63
## 4.7834027 4.9925012 5.2107401 5.4385189 5.6762548 5.9243828 6.1833574
## 64 65 66 67 68 69 70
## 6.4536526 6.7357633 7.0302060 7.3375198 7.6582673 7.9930357 8.3424380
## 71 72 73 74 75 76 77
## 8.7071138 9.0877308 9.4849859 9.8996063 10.3323512 10.7840127 11.2554179
## 78 79 80 81 82 83 84
## 11.7474298 12.2609492 12.7969163 13.3563122 13.9401612 14.5495322 15.1855409
## 85 86 87 88 89 90 91
## 15.8493516 16.5421797 17.2652936 18.0200173 18.8077324 19.6298812 20.4879689
## 92 93 94 95 96 97 98
## 21.3835664 22.3183135 23.2939215 24.3121766 25.3749431 26.4841666 27.6418779
## 99 100 101
## 28.8501967 30.1113352 31.4276022
ggplot(repro, aes(x=size, y=reproduction)) + geom_point() +
geom_line(data = pred1, aes(x=sizex, y=reprolatent), color="red")+
geom_line(data = pred1, aes(x=sizex, y=reprodata), color="blue")
summary(lm(log(reproduction) ~ size, data=repro))
summary(lm(log(reproduction+1) ~ size, data=repro))
summary(lm(log(reproduction+0.1) ~ size, data=repro))
summary(lm(log(reproduction+0.001) ~ size, data=repro))
summary(lm(log(reproduction+0.000000000000000001) ~ size, data=repro))
#let's extract the slope parameter for smaller and smaller deviations from zero
sapply(X = 1:30, function(x){
coef(lm(log(reproduction+10^(-x)) ~ size, data=repro))[2]
}
)
You cannot take the log of zero. The value of the slope is dependent on the arbitrary choice of a small deviation from zero.
Better solution yet, use a specialized package like glmmTMB
glmrepro2 <- glmmTMB(reproduction ~ size, data = repro, family = "nbinom1")
summary(glmrepro2)
## Family: nbinom1 ( log )
## Formula: reproduction ~ size
## Data: repro
##
## AIC BIC logLik deviance df.resid
## 1343.4 1354.5 -668.7 1337.4 297
##
##
## Overdispersion parameter for nbinom1 family (): 7.11
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.29963 0.19273 1.555 0.12
## size 0.21793 0.03485 6.254 4.01e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
newdat <- data.frame(size=seq(from=0, to=10, by=0.1))
predictobject <- predict(glmrepro2, type="link", se.fit = TRUE, newdata = newdat)
newdat$fit <- exp(predictobject$fit)
newdat$lowerCI <- exp(predictobject$fit - 1.96*predictobject$se.fit)
newdat$upperCI <- exp(predictobject$fit + 1.96*predictobject$se.fit)
ggplot(repro, aes(x=size, y=reproduction)) + geom_point()+
geom_smooth(method="glm", method.args=list(family="quasipoisson"), color="blue", fill="blue", fullrange=TRUE, alpha=0.2) +
geom_line(data=newdat, mapping = aes(x=size, y=fit))+
geom_ribbon(inherit.aes = FALSE, data=newdat, aes(x=size, ymin=lowerCI, ymax=upperCI), alpha=0.2)+
ylim(c(-1,30)) + xlim(c(0,max(repro$size)))
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 3 rows containing non-finite values (stat_smooth).
## Warning: Removed 3 rows containing missing values (geom_point).
plot(lmrepro)
check_model(lmrepro)
## Not enough model terms in the conditional part of the model to check for multicollinearity.
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 298 rows containing missing values (geom_text_repel).
plot(lmrepro)
check_model(glmrepro)
## Not enough model terms in the conditional part of the model to check for multicollinearity.
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 299 rows containing missing values (geom_text_repel).
#plot(glmrepro2)#this is not available
check_model(glmrepro2)
## Not enough model terms in the conditional part of the model to check for multicollinearity.
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
No worries. GLMs have different assumptions than lm(). Plots for glm generally look weird even if everything is good.
To assess the performance of the model you need to simulate residuals according to model estimates and then compare the distribution of some properties between the simulated and the real residuals. The process is automated in DHARMa (but is not availble for quasi-Poisson models but it does not matter, the residuals are the same for poisson and quasi-Poisson).
sim1 <- simulateResiduals(glm(reproduction ~ size, data = repro, family = "poisson"))
testResiduals(sim1)
## $uniformity
##
## One-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.25334, p-value < 2.2e-16
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## ratioObsSim = 5.734, p-value < 2.2e-16
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test
##
## data: simulationOutput
## outLow = 2.5000e+01, outHigh = 2.1000e+01, nobs = 3.0000e+02, freqH0 =
## 3.9841e-03, p-value < 2.2e-16
## alternative hypothesis: two.sided
## $uniformity
##
## One-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.25334, p-value < 2.2e-16
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## ratioObsSim = 5.734, p-value < 2.2e-16
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test
##
## data: simulationOutput
## outLow = 2.5000e+01, outHigh = 2.1000e+01, nobs = 3.0000e+02, freqH0 =
## 3.9841e-03, p-value < 2.2e-16
## alternative hypothesis: two.sided
sim2 <- simulateResiduals(glmrepro2)
testResiduals(sim2)
## $uniformity
##
## One-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.091879, p-value = 0.01263
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## ratioObsSim = 2.0856, p-value < 2.2e-16
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test
##
## data: simulationOutput
## outLow = 1.0000e+00, outHigh = 2.0000e+00, nobs = 3.0000e+02, freqH0 =
## 3.9841e-03, p-value = 0.6716
## alternative hypothesis: two.sided
## $uniformity
##
## One-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.091879, p-value = 0.01263
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## ratioObsSim = 2.0856, p-value < 2.2e-16
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test
##
## data: simulationOutput
## outLow = 1.0000e+00, outHigh = 2.0000e+00, nobs = 3.0000e+02, freqH0 =
## 3.9841e-03, p-value = 0.6716
## alternative hypothesis: two.sided
So the poisson fitted in glm() is really bad; it does not capture variation. The model fitted in glmmTMB is not ideal either, but much better.
The second family of negative binomial performs better:
glmrepro3 <- glmmTMB(reproduction ~ size, data = repro, family = "nbinom2")
sim3 <- simulateResiduals(glmrepro3)
testResiduals(sim3)
## $uniformity
##
## One-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.06909, p-value = 0.1141
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## ratioObsSim = 1.8154, p-value < 2.2e-16
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test
##
## data: simulationOutput
## outLow = 3.0000e+00, outHigh = 3.0000e+00, nobs = 3.0000e+02, freqH0 =
## 3.9841e-03, p-value = 0.2383
## alternative hypothesis: two.sided
## $uniformity
##
## One-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.06909, p-value = 0.1141
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## ratioObsSim = 1.8154, p-value < 2.2e-16
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test
##
## data: simulationOutput
## outLow = 3.0000e+00, outHigh = 3.0000e+00, nobs = 3.0000e+02, freqH0 =
## 3.9841e-03, p-value = 0.2383
## alternative hypothesis: two.sided
Finally, in the case of over-dispersion (it does not work with under-dispersion!) we can model extra variance using an observation-level random effect:
repro$row <- 1:nrow(repro)
glmrepro4 <- glmmTMB(reproduction ~ sex +size + (1|row),
data = repro, family = "poisson")
sim4 <- simulateResiduals(glmrepro4)
testResiduals(sim4)
## $uniformity
##
## One-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.043371, p-value = 0.6251
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## ratioObsSim = 1.5972, p-value = 0.128
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test
##
## data: simulationOutput
## outLow = 1.0000e+00, outHigh = 1.0000e+00, nobs = 3.0000e+02, freqH0 =
## 3.9841e-03, p-value = 1
## alternative hypothesis: two.sided
## $uniformity
##
## One-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.043371, p-value = 0.6251
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## ratioObsSim = 1.5972, p-value = 0.128
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test
##
## data: simulationOutput
## outLow = 1.0000e+00, outHigh = 1.0000e+00, nobs = 3.0000e+02, freqH0 =
## 3.9841e-03, p-value = 1
## alternative hypothesis: two.sided
Here the random effect model provides the best fit to the data and (probably) the correct amount of uncertainty for parameter estimates. However this works only for over-dispersion and is more difficult to work with (you need to think about including random effect variance in predictions). For other data the negative binomial models will perform better.
pred4 <- predict(glmrepro4, type = "response")
mean(pred4) ; mean(repro$reproduction)
## [1] 3.522955
## [1] 3.62
var(pred4) ; var(repro$reproduction)
## [1] 126.3489
## [1] 134.2565
hist(pred4); hist(repro$reproduction)
pred3 <- predict(glmrepro3, type = "response")
mean(pred3) ; mean(repro$reproduction)
## [1] 3.517754
## [1] 3.62
var(pred3) ; var(repro$reproduction)
## [1] 11.51497
## [1] 134.2565
hist(pred3); hist(repro$reproduction)
General appraoch:
glmreprosex <- glmmTMB(reproduction ~ sex + (1|row),
data = repro, family = "poisson")
summary(glmreprosex)
## Family: poisson ( log )
## Formula: reproduction ~ sex + (1 | row)
## Data: repro
##
## AIC BIC logLik deviance df.resid
## 1315.7 1326.8 -654.9 1309.7 297
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## row (Intercept) 1.729 1.315
## Number of obs: 300, groups: row, 300
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.03565 0.14444 0.247 0.8050
## sex 0.43713 0.18588 2.352 0.0187 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tapply(X = predict(glmreprosex, re.form = NULL, type = "response"),
INDEX = repro$sex, mean)
## 0 1
## 3.390593 3.660415
simrs1 <- simulateResiduals(glmreprosex)
testResiduals(simrs1)
## $uniformity
##
## One-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.032701, p-value = 0.9054
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## ratioObsSim = 1.6997, p-value = 0.072
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test
##
## data: simulationOutput
## outLow = 2.0000e+00, outHigh = 3.0000e+00, nobs = 3.0000e+02, freqH0 =
## 3.9841e-03, p-value = 0.2383
## alternative hypothesis: two.sided
## $uniformity
##
## One-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.032701, p-value = 0.9054
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## ratioObsSim = 1.6997, p-value = 0.072
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test
##
## data: simulationOutput
## outLow = 2.0000e+00, outHigh = 3.0000e+00, nobs = 3.0000e+02, freqH0 =
## 3.9841e-03, p-value = 0.2383
## alternative hypothesis: two.sided
glmreprosex2 <- glmmTMB(reproduction ~ sex ,
data = repro, family = "nbinom1")
summary(glmreprosex)
## Family: poisson ( log )
## Formula: reproduction ~ sex + (1 | row)
## Data: repro
##
## AIC BIC logLik deviance df.resid
## 1315.7 1326.8 -654.9 1309.7 297
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## row (Intercept) 1.729 1.315
## Number of obs: 300, groups: row, 300
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.03565 0.14444 0.247 0.8050
## sex 0.43713 0.18588 2.352 0.0187 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tapply(X = predict(glmreprosex, type = "response"),
INDEX = repro$sex, mean)
## 0 1
## 3.390593 3.660415
simrs2 <- simulateResiduals(glmreprosex2)
testResiduals(simrs2)
## $uniformity
##
## One-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.088339, p-value = 0.01852
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## ratioObsSim = 2.0528, p-value < 2.2e-16
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test
##
## data: simulationOutput
## outLow = 0.0000e+00, outHigh = 4.0000e+00, nobs = 3.0000e+02, freqH0 =
## 3.9841e-03, p-value = 0.06609
## alternative hypothesis: two.sided
## $uniformity
##
## One-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.088339, p-value = 0.01852
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## ratioObsSim = 2.0528, p-value < 2.2e-16
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test
##
## data: simulationOutput
## outLow = 0.0000e+00, outHigh = 4.0000e+00, nobs = 3.0000e+02, freqH0 =
## 3.9841e-03, p-value = 0.06609
## alternative hypothesis: two.sided
glmreprosex3 <- glmmTMB(reproduction ~ sex ,
data = repro, family = "nbinom2")
summary(glmreprosex3)
## Family: nbinom2 ( log )
## Formula: reproduction ~ sex
## Data: repro
##
## AIC BIC logLik deviance df.resid
## 1378.1 1389.2 -686.1 1372.1 297
##
##
## Overdispersion parameter for nbinom2 family (): 0.439
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.24988 0.13122 9.525 <2e-16 ***
## sex 0.07143 0.18460 0.387 0.699
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tapply(X = predict(glmreprosex, type = "response"),
INDEX = repro$sex, mean)
## 0 1
## 3.390593 3.660415
simrs3 <- simulateResiduals(glmreprosex3)
testResiduals(simrs3)
## $uniformity
##
## One-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.084508, p-value = 0.02755
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## ratioObsSim = 2.0349, p-value < 2.2e-16
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test
##
## data: simulationOutput
## outLow = 1.0000e+00, outHigh = 3.0000e+00, nobs = 3.0000e+02, freqH0 =
## 3.9841e-03, p-value = 0.2383
## alternative hypothesis: two.sided
## $uniformity
##
## One-sample Kolmogorov-Smirnov test
##
## data: simulationOutput$scaledResiduals
## D = 0.084508, p-value = 0.02755
## alternative hypothesis: two-sided
##
##
## $dispersion
##
## DHARMa nonparametric dispersion test via sd of residuals fitted vs.
## simulated
##
## data: simulationOutput
## ratioObsSim = 2.0349, p-value < 2.2e-16
## alternative hypothesis: two.sided
##
##
## $outliers
##
## DHARMa outlier test based on exact binomial test
##
## data: simulationOutput
## outLow = 1.0000e+00, outHigh = 3.0000e+00, nobs = 3.0000e+02, freqH0 =
## 3.9841e-03, p-value = 0.2383
## alternative hypothesis: two.sided
voles <- read.csv("data/voles.csv")
summary(voles)
## id year sex age mass
## CN07-096: 4 Min. :2006 Female:686 A:530 Min. :11.00
## CN07-113: 4 1st Qu.:2007 Male :608 J:764 1st Qu.:23.00
## CN07-114: 4 Median :2009 Median :29.00
## CN06-036: 3 Mean :2010 Mean :30.97
## CN06-065: 3 3rd Qu.:2013 3rd Qu.:40.00
## CN06-089: 3 Max. :2015 Max. :62.00
## (Other) :1273 NA's :26
## survival reproduction fitness StMass
## Min. :0.0000 Min. : 0.000 Min. : 0.000 Min. :-1.9245
## 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.:-0.7683
## Median :0.0000 Median : 0.000 Median : 0.000 Median :-0.1902
## Mean :0.2141 Mean : 1.083 Mean : 1.512 Mean : 0.0000
## 3rd Qu.:0.0000 3rd Qu.: 1.000 3rd Qu.: 2.000 3rd Qu.: 0.8696
## Max. :1.0000 Max. :15.000 Max. :15.000 Max. : 2.9893
## NA's :26
## mother
## CN13-002: 20
## CN06-104: 17
## CN07-125: 17
## CN06-050: 15
## CN07-036: 15
## (Other) :999
## NA's :211
Model reproduction