Shinichi Nakagawa, David F. Westneat, Ayumi Mizuno, Yimen G. Araya-Ajoy, Barbara Class, Niels Dingemanse, Ned A. Dochtermann, Malgorzata Lagisz, Kate L. Laskowski, Joel L. Pick, Denis Réale, Coralie Williams, Jonathan Wright and Holger Schielzeth
1 Introduction
This online material is a supplement to our paper “Understanding different types of repeatabilty and intra-class correlation for an analysis of biological variation”. We will show how to obtain different types of repeatability (intra-class correlation) using the lme4, rptR and partR2 packages in R.
2 Contents
In this online material, we will illustrate how to get repeatabilty for four traits: 1) body length of adult beetles (Gaussian distribution, N = 2,160), 2) frequencies of the two distinct male colour morphs (binomial or Bernoulli distribution, N = 1,080), 3) the number of eggs laid by each female (Poisson distribution) after random mating (N = 1,080) and 4) the activity levels (log-normally distributed) with three observations per individual (N = 6,480).
Fig: An overview of experimental and sampling design
After generating these data, we will:
Use lme4 to fit a full model, which match the data generating process, and a reduced model, which only fit the target clustering variable (e.g., Population).
Use rptR and partR2 to obtain variance component for each fixed and random effect including residuals.
By using estimated variance components, estimate different types of repeatability (e.g., adjusted, unadjusted and marginal repeatability)
In addition, we show how to obtain conditional repeatability for body length at different levels of moisture (0, 1 and 2).
Also, importantly, both rptR and partR2 packages have vignettes that provide detailed information on how to use them and associated publications.
The rptR package is available on CRAN and the vignette can be found here.
Stoffel, M. A., Nakagawa, S., & Schielzeth, H. (2017). rptR: Repeatability estimation and variance decomposition by generalized linear mixed‐effects models. Methods in ecology and evolution, 8(11), 1639-1644.
The partR2 package is available on CRAN and the vignette can be found here.
Stoffel, M. A., Nakagawa, S., & Schielzeth, H. (2021). partR2: Partitioning R2 in generalized linear mixed models. PeerJ, 9, e11414.
3 Prerequisites
3.1 Loading packages
Our tutorial uses R statistical software and existing R packages, which you will first need to download and install.
If the packages are archived in CRAN, use install.packages() to install them. For example, to install the lme4 , you can execute install.packages("lme4") in the console (bottom left pane of R Studio).
Version information of each package is listed at the end of this tutorial.
Code
# clearing memoryrm(list =ls())# attempt to load or install necessary packagesif(!require(pacman)) install.packages("pacman")pacman::p_load(lme4, lmerTest, rptR, partR2, pander, here, sessioninfo, details)
4 Basic data and preparation
Here we create identifiers for Population, Container, Individual, Sex, Moisture and Student for 12 populations, 72 containers, 30 individuals per container, and 3 conditions (moisture levels). The data are generated at the individual level and then expanded to the observation level with three observations per individual.
Code
# number of thingsn_populations <-12# number of populationsn_containers <-72# number of containersn_samples <-30# number of samples per containern_individuals <-2160# number of individuals# Data at the individual level# 12 different populations Population <-gl(n_populations, n_samples*2*3, n_individuals) # 2 sex and 3 conditions# 120 containers (30 individuals in each container)Container <-gl(n_containers, n_samples, n_individuals)# Individual IDIndividual <-factor(1:n_individuals)# Sex of the individuals. Uni-sex within each container (individuals are sorted at the pupa stage)Sex <-factor(rep(c(rep("Female", n_samples), rep("Male", n_samples)), n_containers/2))# Observer (person A and B)Student <-factor(rep(c("A", "B"), each = n_individuals/2))# Moisture levelsMoisture <-rep(rep(0:2, each=60), n_populations) # 3 levels of moisture (1, 2, 3)# the first basic data set with matrix of 2160 x 6dat1 <-data.frame(Population = Population, Individual = Individual, Container = Container, Sex = Sex, Moisture = Moisture, Student = Student)# Data at the observation level#Individual <- factor(rep(1:n_individuals, 3)) # 3 observations per individual# Basic data set with a matrix of 6480 x 5 dat2 <-rbind(dat1, dat1, dat1) # 3 observations per individualdat2 <- dat2[, -6] # taking out student info (as activity level was measured by a machine)
5 Body lengh
Body length is a continuous trait that is normally distributed.
5.1 Data generation
Code
# Setting the seedset.seed(22376)# Simulate random effects for PopulationsPopulation_intercept <-rnorm(n_populations, 0, sqrt(1.3)) # variance of 1.3Population_slope <-rnorm(n_populations, 0, sqrt(0.1)) # variance of 0.1# Simulate random effect for containers (random intercept)Container_intercept <-rnorm(n_containers, 0, sqrt(0.1)) # variance of 0.1# Generate outcome (BodyL) with both random intercept and random slope for Moisture per populationdat1$BodyL <-15-3* (as.numeric(dat1$Sex) -1) +0.4* dat1$Moisture +# fixed effect of Moisture0.05* (as.numeric(dat1$Student) -1) +# fixed effect of Student Population_intercept[dat1$Population] +# random intercept: Population Population_slope[dat1$Population] * dat1$Moisture +# random slope: Population x Moisture Container_intercept[dat1$Container] +# random intercept: Containerrnorm(n_individuals, 0, sqrt(1)) # residual (variance of 1)
We use the lmer function in lme4 to fix a ‘full’ model, which completely match with the data generating process. The model includes fixed effects of Sex, Moisture and Student, and random intercepts for Container and random slopes for Moisture by Population.
Our target clustering (grouping) variable is Population, for which we want to estimate repeatability.
Code
# fitting a mixed modelsize_full <-lmer(BodyL ~1+ Sex + Moisture + Student + (1|Container) + (1+ Moisture|Population),data = dat1)summary(size_full)## Linear mixed model fit by REML. t-tests use Satterthwaite's method [## lmerModLmerTest]## Formula: BodyL ~ 1 + Sex + Moisture + Student + (1 | Container) + (1 + ## Moisture | Population)## Data: dat1## ## REML criterion at convergence: 6230.9## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.1363 -0.6899 0.0214 0.6819 3.3819 ## ## Random effects:## Groups Name Variance Std.Dev. Corr ## Container (Intercept) 0.08078 0.2842 ## Population (Intercept) 3.04570 1.7452 ## Moisture 0.10063 0.3172 -0.56## Residual 0.97053 0.9852 ## Number of obs: 2160, groups: Container, 72; Population, 12## ## Fixed effects:## Estimate Std. Error df t value Pr(>|t|) ## (Intercept) 14.59595 0.66531 11.57950 21.938 8.58e-11 ***## SexMale -2.89866 0.07928 47.00097 -36.563 < 2e-16 ***## Moisture 0.46650 0.10365 10.99947 4.501 0.0009 ***## StudentB 1.39444 0.85635 9.99965 1.628 0.1345 ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Correlation of Fixed Effects:## (Intr) SexMal Moistr## SexMale -0.060 ## Moisture -0.410 0.000 ## StudentB -0.644 0.000 0.000
We can also fit a reduced model, which only fit our target clustering variable, Population.
Code
# fitting a reduced modelsize_reduced <-lmer(BodyL ~1+ (1|Population), data = dat1)summary(size_reduced)## Linear mixed model fit by REML. t-tests use Satterthwaite's method [## lmerModLmerTest]## Formula: BodyL ~ 1 + (1 | Population)## Data: dat1## ## REML criterion at convergence: 8802.1## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -2.55616 -0.79195 -0.00211 0.81433 2.56497 ## ## Random effects:## Groups Name Variance Std.Dev.## Population (Intercept) 2.508 1.584 ## Residual 3.355 1.832 ## Number of obs: 2160, groups: Population, 12## ## Fixed effects:## Estimate Std. Error df t value Pr(>|t|) ## (Intercept) 14.3103 0.4588 11.0000 31.19 4.37e-12 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
5.3 Variance decomposition
We now want to get variance for each fixed and random effect, including residuals. We can use the rptR and partR2 package to do this.
# the same model as `size_full`size_rpt_full <-rpt(BodyL ~1+ Student + Sex + Moisture + (1| Container) + (1+ Moisture| Population), grname =c("Container", "Population", "Fixed", "Residual"), data = dat1, datatype ="Gaussian",nboot =1000, npermut =0, ratio =FALSE)# savingsaveRDS(size_rpt_full, here("Rdata", "size_rpt_full.rds"))# partR2 cannot process a random slope model so take out# we are assuming that fixed variances are teh same with or without the random slopesize_full2 <-lmer(BodyL ~1+ Sex + Moisture + Student + (1|Container) + (1|Population),data = dat1)# No random slopessize_part <-partR2(size_full2, partvars =c("Moisture", "Sex", "Student"),nboot =NULL)# savingsaveRDS(size_part, here("Rdata", "size_part.rds"))
Code
# reading in Rdatasize_rpt_full <-readRDS(here("Rdata", "size_rpt_full.rds"))size_part <-readRDS(here("Rdata", "size_part.rds"))size_rpt_full## ## ## Variance estimation using the lmm method ## ## Variance of Container## Var = 0.081## SE = 0.023## CI = [0.039, 0.13]## P = 4.89e-14 [LRT]## NA [Permutation]## ## Variance of Population## Var = 2.591## SE = 1.163## CI = [0.826, 5.182]## P = 9.01e-31 [LRT]## NA [Permutation]## ## Variance of Residual## Var = 0.971## SE = 0.03## CI = [0.911, 1.032]## P = NA [LRT]## NA [Permutation]## ## Variance of Fixed## Var = 2.733## SE = 0.759## CI = [2.145, 4.783]## P = NA [LRT]## NA [Permutation]size_part## ## ## R2 (marginal) and 95% CI for the full model: ## R2 CI_lower CI_upper nboot ndf## 0.4059 NA NA 1 4 ## ## ----------## ## Part (semi-partial) R2:## Predictor(s) R2 CI_lower CI_upper nboot ndf## Model 0.4059 NA NA 1 4 ## Moisture 0.0239 NA NA 1 3 ## Sex 0.3458 NA NA 1 3 ## Student 0.0362 NA NA 1 3 ## Moisture+Sex 0.3697 NA NA 1 2 ## Moisture+Student 0.0601 NA NA 1 2 ## Sex+Student 0.3820 NA NA 1 2 ## Moisture+Sex+Student 0.4059 NA NA 1 1
Code
# the same model as `size_full`size_rpt_reduced <-rpt(BodyL ~1+ (1|Population),grname =c("Population", "Residual"), data = dat1, datatype ="Gaussian",nboot =1000, npermut =0, ratio =FALSE)# savingsaveRDS(size_rpt_reduced, here("Rdata", "size_rpt_reduced.rds"))
Code
# reading in Rdatasize_rpt_reduced <-readRDS(here("Rdata", "size_rpt_reduced.rds"))size_rpt_reduced## ## ## Variance estimation using the lmm method ## ## Variance of Population## Var = 2.508## SE = 1.084## CI = [0.918, 5.072]## P = 1.05e-234 [LRT]## NA [Permutation]## ## Variance of Residual## Var = 3.355## SE = 0.1## CI = [3.166, 3.554]## P = NA [LRT]## NA [Permutation]
5.4 Repeatability calculation
Our target clustering variable is Population, and we want to estimate repeatability (R) as the ratio of the variance of the target clustering variable (Population) to the total variance.
Here we have two kinds of effects: 1) effects which we want to remove (adjust) from repeatability calculation (Student and Container) and 2) effects which we want to retain (Sex, Moistureand Population).
Code
# this is mixture of Rv1, Rv2 and Rv3# also, this repeatability is marginalized in relation to `Moisture`# getting adjusted fixed effect variance.# getting % of `Sex` and `Moisture in relation to the total fixed effect varianceadj_factor <- size_part$R2[5 ,"estimate"][[1]]/ size_part$R2[8 ,"estimate"][[1]]# getting % of # variance for `Sex`, `Moisture'Fixed_point <- size_rpt_full$R[1, "Fixed"] * adj_factor Fixed_boot <- size_rpt_full$R_boot[, "Fixed"] * adj_factor# getting % of 'Moisture' to the total fixed effect varianceadj_factor2 <- size_part$R2[2 ,"estimate"][[1]]/ size_part$R2[8 ,"estimate"][[1]]# variance for `Moisture'Fixed_point2 <- size_rpt_full$R[1, "Fixed"] * adj_factor2 Fixed_boot2 <- size_rpt_full$R_boot[, "Fixed"] * adj_factor2# point estimate(size_rpt_full$R[1, "Population"] + Fixed_point)/ (size_rpt_full$R[1, "Population"] + Fixed_point + size_rpt_full$R[1, "Residual"])## [1] 0.8396141# 95% CI using bootstrappingCI <- (size_rpt_full$R_boot[, "Population"] + Fixed_boot) / (size_rpt_full$R_boot[, "Population"] + Fixed_boot + size_rpt_full$R_boot[, "Residual"])quantile(CI, c(0.025, 0.975))## 2.5% 97.5% ## 0.7700461 0.8936619# repeatability - now removing `Sex` effect from numerator but not denominator# point estimate(size_rpt_full$R[1, "Population"] + Fixed_point2)/ (size_rpt_full$R[1, "Population"] + Fixed_point + size_rpt_full$R[1, "Residual"])## [1] 0.4548239# 95% CI using bootstrappingCI <- (size_rpt_full$R_boot[, "Population"] + Fixed_boot2) / (size_rpt_full$R_boot[, "Population"] + Fixed_boot + size_rpt_full$R_boot[, "Residual"])quantile(CI, c(0.025, 0.975))## 2.5% 97.5% ## 0.2157872 0.6125273# Extra - variance component for `Moisture` `Sex` and `Student`.# SexFixed_point - Fixed_point2## [1] 2.328446# MoistureFixed_point2## [1] 0.1608182# Studentsize_rpt_full$R[1, "Fixed"] - Fixed_point## [1] 0.2437726
Activity level (log-transformed) is a continuous trait (i.e., mm traveled) that is log-normally distributed. We have three observations per individual.
6.1 Data generation
Code
# Setting the seedset.seed(4242)# Simulate random effects for PopulationsPopulation_intercept <-rnorm(n_populations, 0, sqrt(1.3))Population_slope <-rnorm(n_populations, 0, sqrt(0.1)) # random slope variance (adjustable)# Simulate random effect for containers (random intercept)Container_intercept <-rnorm(n_containers, 0, sqrt(0.1))# Individual interceptIndividual_intercept <-rnorm(n_individuals, 0, sqrt(2))dat2$Activity <-exp(0.1+1.2* (as.numeric(dat2$Sex) -1) +0.3* dat2$Moisture +# fixed effect of Moisture Population_intercept[dat2$Population] +# random intercept: Population Population_slope[dat2$Population] * dat2$Moisture +# random slope: Population x Moisture Container_intercept[dat2$Container] +# random intercept: Container Individual_intercept[dat2$Individual] +# random intercept: Individualrnorm(n_individuals*3, 0, sqrt(0.5)) ) # residual
We use lmer function in lme4 to fix a ‘full’ model, which completely match with the data generating process. The model includes fixed effects of Sex, Moisture, and random intercepts for Container and random slopes for Moisture by Population, and random intercepts for Individual.
Code
# fitting a mixed modelact_full <-lmer(log(Activity) ~1+ Sex + Moisture + (1|Container) + (1+ Moisture|Population) + (1| Individual), data = dat2)summary(act_full)## Linear mixed model fit by REML. t-tests use Satterthwaite's method [## lmerModLmerTest]## Formula: ## log(Activity) ~ 1 + Sex + Moisture + (1 | Container) + (1 + Moisture | ## Population) + (1 | Individual)## Data: dat2## ## REML criterion at convergence: 19620.5## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.5039 -0.5558 -0.0091 0.5678 2.8141 ## ## Random effects:## Groups Name Variance Std.Dev. Corr## Individual (Intercept) 1.9698 1.4035 ## Container (Intercept) 0.1846 0.4296 ## Population (Intercept) 1.6600 1.2884 ## Moisture 0.0801 0.2830 0.13## Residual 0.5072 0.7122 ## Number of obs: 6480, groups: Individual, 2160; Container, 72; Population, 12## ## Fixed effects:## Estimate Std. Error df t value Pr(>|t|) ## (Intercept) -0.09765 0.38829 11.53788 -0.251 0.8059 ## SexMale 1.24965 0.11922 46.99876 10.481 6.87e-14 ***## Moisture 0.26923 0.10957 11.00181 2.457 0.0318 * ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Correlation of Fixed Effects:## (Intr) SexMal## SexMale -0.154 ## Moisture -0.036 0.000
We can also fit a reduced model, which only fit our target clustering variable, Indiviudal.
Code
# fitting a reduced modelact_reduced <-lmer(log(Activity) ~1+ (1|Individual), data = dat2)summary(act_reduced)## Linear mixed model fit by REML. t-tests use Satterthwaite's method [## lmerModLmerTest]## Formula: log(Activity) ~ 1 + (1 | Individual)## Data: dat2## ## REML criterion at convergence: 21073.2## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.5923 -0.5556 -0.0123 0.5626 2.7947 ## ## Random effects:## Groups Name Variance Std.Dev.## Individual (Intercept) 4.3127 2.0767 ## Residual 0.5072 0.7122 ## Number of obs: 6480, groups: Individual, 2160## ## Fixed effects:## Estimate Std. Error df t value Pr(>|t|) ## (Intercept) 7.964e-01 4.555e-02 2.159e+03 17.48 <2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
6.3 Variance decomposition
We now want to get variance for each fixed and random effect, including residuals. We can use the rptR and partR2 package to do this.
# the same model as `size_full`act_rpt_full <-rpt(log(Activity) ~1+ Sex + Moisture + (1|Container) + (1+ Moisture|Population) + (1| Individual), grname =c("Container", "Population", "Individual", "Fixed", "Residual"), data = dat2, datatype ="Gaussian",nboot =1000, npermut =1000, ratio =FALSE)# savingsaveRDS(act_rpt_full, here("Rdata", "act_rpt_full.rds"))# partR2 cannot process a random slope model so take outact_full2 <-lmer(log(Activity) ~1+ Sex + Moisture + (1|Container) + (1|Population) + (1| Individual), data = dat2)# No random slopes# we are assuming that fixed variances are teh same with or without the random slopeact_part <-partR2(act_full2, partvars =c("Moisture", "Sex"),nboot =NULL)# savingsaveRDS(act_part, here("Rdata", "act_part.rds"))
Code
# reading in Rdataact_rpt_full <-readRDS(here("Rdata", "act_rpt_full.rds"))act_part <-readRDS(here("Rdata", "act_part.rds"))act_rpt_full## ## ## Variance estimation using the lmm method ## ## Variance of Container## Var = 0.185## SE = 0.05## CI = [0.09, 0.285]## P = 1.14e-14 [LRT]## 0.004 [Permutation]## ## Variance of Population## Var = 1.885## SE = 0.791## CI = [0.69, 3.668]## P = 3.01e-20 [LRT]## 0.324 [Permutation]## ## Variance of Individual## Var = 1.97## SE = 0.068## CI = [1.835, 2.105]## P = 0 [LRT]## 0.001 [Permutation]## ## Variance of Residual## Var = 0.507## SE = 0.011## CI = [0.487, 0.529]## P = NA [LRT]## NA [Permutation]## ## Variance of Fixed## Var = 0.439## SE = 0.083## CI = [0.303, 0.623]## P = NA [LRT]## NA [Permutation]act_part## ## ## R2 (marginal) and 95% CI for the full model: ## R2 CI_lower CI_upper nboot ndf## 0.0881 NA NA 1 3 ## ## ----------## ## Part (semi-partial) R2:## Predictor(s) R2 CI_lower CI_upper nboot ndf## Model 0.0881 NA NA 1 3 ## Moisture 0.0097 NA NA 1 2 ## Sex 0.0784 NA NA 1 2 ## Moisture+Sex 0.0881 NA NA 1 1
Code
# the same model as `size_full`act_rpt_reduced <-rpt(log(Activity) ~1+ (1| Individual), grname =c("Individual", "Residual"), data = dat2, datatype ="Gaussian",nboot =1000, npermut =0,ratio =FALSE)# savingsaveRDS(act_rpt_reduced, here("Rdata", "act_rpt_reduced.rds"))
Code
# reading in Rdataact_rpt_reduced <-readRDS(here("Rdata", "act_rpt_reduced.rds"))act_rpt_reduced## ## ## Variance estimation using the lmm method ## ## Variance of Individual## Var = 4.313## SE = 0.133## CI = [4.047, 4.583]## P = 0 [LRT]## NA [Permutation]## ## Variance of Residual## Var = 0.507## SE = 0.011## CI = [0.486, 0.528]## P = NA [LRT]## NA [Permutation]
6.4 Repeatability calculation
Our target clustering variable is individual, and we want to estimate repeatability (R) as the ratio of the variance of the target clustering variable (individual) to the total variance.
Here we have two kinds of effects: 1) effects which we want to remove (adjust) from repeatability calculation (Container) and 2) effects which we want to retain (Sex, Moisture, Population and individual).
Code
# this is mixture of Rv1, Rv2 and Rv3# also, this repeatability is marginalized in relation to `Moisture`# getting adjusted fixed effect variance.# getting % of 'Moisture' to the total fixed effect varianceadj_factor3 <- act_part$R2[2 ,"estimate"][[1]]/ act_part$R2[4 ,"estimate"][[1]]# variance for `Moisture'Fixed_point3 <- act_rpt_full$R[1, "Fixed"] * adj_factor3 Fixed_boot3 <- act_rpt_full$R_boot[, "Fixed"] * adj_factor3# point estimate(act_rpt_full$R[1, "Individual"] + act_rpt_full$R[1, "Population"] + act_rpt_full$R[1, "Fixed"])/ (act_rpt_full$R[1, "Individual"] + act_rpt_full$R[1, "Population"]+ act_rpt_full$R[1, "Fixed"] + act_rpt_full$R[1, "Residual"])## [1] 0.8943532# 95% CI using bootstrappingCI <- (act_rpt_full$R_boot[, "Individual"] + act_rpt_full$R_boot[, "Population"] + act_rpt_full$R_boot[, "Fixed"])/ (act_rpt_full$R_boot[, "Individual"] + act_rpt_full$R_boot[, "Population"] + act_rpt_full$R_boot[, "Fixed"] + act_rpt_full$R_boot[, "Residual"])quantile(CI, c(0.025, 0.975))## 2.5% 97.5% ## 0.8575955 0.9236931# repeatability - now removing `Sex` effect from numerator but not denominator# point estimate# point estimate(act_rpt_full$R[1, "Individual"] + act_rpt_full$R[1, "Population"] + Fixed_point3)/ (act_rpt_full$R[1, "Individual"] + act_rpt_full$R[1, "Population"]+ act_rpt_full$R[1, "Fixed"] + act_rpt_full$R[1, "Residual"])## [1] 0.8130215# 95% CI using bootstrappingCI <- (act_rpt_full$R_boot[, "Individual"] + act_rpt_full$R_boot[, "Population"] + Fixed_boot3)/ (act_rpt_full$R_boot[, "Individual"] + act_rpt_full$R_boot[, "Population"] + act_rpt_full$R_boot[, "Fixed"] + act_rpt_full$R_boot[, "Residual"])quantile(CI, c(0.025, 0.975))## 2.5% 97.5% ## 0.7446819 0.8672014# Extra - variance component for `Moisture` and `Sex`. # MoistureFixed_point3## [1] 0.04833146# Sexact_rpt_full$R[1, "Fixed"] - Fixed_point3## [1] 0.3904651
The frequencies of the two distinct male colour morphs: black (0) or reddish (1) is a binomial or Bernoulli trait.
7.1 Data generation
Code
# Setting the seedset.seed(123)# Subset the design matrix (only males express colour morphs)datm <- dat1[dat1$Sex =="Male", ]# Simulate random effects for PopulationsPopulation_intercept <-rnorm(n_populations, 0, sqrt(1.2))# Simulate random effect for containers (random intercept)Container_intercept <-rnorm(n_containers, 0, sqrt(0.1))# generation of response values on latent scale (!) based on fixed effects,# random effects and residual errorsColourL <-0.8+ (-0.6) * datm$Moisture +# fixed effect of Moisture Population_intercept[datm$Population] +# random intercept: Population Container_intercept[datm$Container] # random intercept: Container# data generation (on data scale!) based on binomial distributiondatm$Colour <-rbinom(length(ColourL), 1, plogis(ColourL))
We use glmer function in lme4 to fix a ‘full’ model, which completely match with the data generating process. The model includes a fixed effect of Moisture, and random intercepts for Container and Population.
Code
# fitting a mixed modelmorph_full <-glmer(Colour ~1+ Moisture + (1|Container) + (1|Population),family =binomial(link ="logit"),data = datm)summary(morph_full)## Generalized linear mixed model fit by maximum likelihood (Laplace## Approximation) [glmerMod]## Family: binomial ( logit )## Formula: Colour ~ 1 + Moisture + (1 | Container) + (1 | Population)## Data: datm## ## AIC BIC logLik -2*log(L) df.resid ## 1278.8 1298.7 -635.4 1270.8 1076 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.8762 -0.7389 0.3922 0.7478 2.1894 ## ## Random effects:## Groups Name Variance Std.Dev.## Container (Intercept) 0.2030 0.4505 ## Population (Intercept) 0.8095 0.8997 ## Number of obs: 1080, groups: Container, 36; Population, 12## ## Fixed effects:## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 1.0734 0.3091 3.473 0.000515 ***## Moisture -0.6605 0.1282 -5.153 2.56e-07 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Correlation of Fixed Effects:## (Intr)## Moisture -0.425
We can also fit a reduced model, which only fit our target clustering variable, Population.
Code
# fitting a reduced modelmorph_reduced <-glmer(Colour ~1+ (1|Population), family =binomial(link ="logit"),data = datm)summary(morph_reduced)## Generalized linear mixed model fit by maximum likelihood (Laplace## Approximation) [glmerMod]## Family: binomial ( logit )## Formula: Colour ~ 1 + (1 | Population)## Data: datm## ## AIC BIC logLik -2*log(L) df.resid ## 1344.6 1354.6 -670.3 1340.6 1078 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -2.5536 -0.9101 0.4732 0.9286 1.5800 ## ## Random effects:## Groups Name Variance Std.Dev.## Population (Intercept) 0.7524 0.8674 ## Number of obs: 1080, groups: Population, 12## ## Fixed effects:## Estimate Std. Error z value Pr(>|z|)## (Intercept) 0.3952 0.2596 1.522 0.128
7.3 Variance decomposition
We now want to get variance for each fixed and random effect, including residuals. We can use just rptR as we only have one fixed effect so fixed effects do not need to be decomposed further.
# the same model as `size_full`morph_rpt_full <-rpt(Colour ~1+ Moisture + (1| Container) + (1| Population), grname =c("Container", "Population", "Fixed", "Residual"), data = datm, datatype ="Binary",nboot =1000, npermut =0, ratio =FALSE)# savingsaveRDS(morph_rpt_full, here("Rdata", "morph_rpt_full.rds"))
Code
# reading in Rdatamorph_rpt_full <-readRDS(here("Rdata", "morph_rpt_full.rds"))morph_rpt_full## ## ## Variance estimation using the glmm method and logit link ## ## Variance of Container## --------------------------------## Link-scale approximation:## Var = 0.203## SE = 0.114## CI = [0.002, 0.455]## P = 0.000755 [LRT]## 0.001 [Permutation]## ## Variance of Population## --------------------------------## Link-scale approximation:## Var = 0.81## SE = 0.381## CI = [0.184, 1.635]## P = 2.15e-05 [LRT]## 0.001 [Permutation]## ## Variance of Residual## --------------------------------## Link-scale approximation:## Var = 4.107## SE = 0.166## CI = [4, 4.586]## P = NA [LRT]## NA [Permutation]## ## Variance of Fixed## --------------------------------## Link-scale approximation:## Var = 0.291## SE = 0.115## CI = [0.12, 0.561]## P = NA [LRT]## NA [Permutation]
Code
# the same model as `size_full`morph_rpt_reduced <-rpt(Colour ~1+ (1|Population),grname =c("Population", "Residual"), data = datm, datatype ="Binary",nboot =1000, npermut =0, ratio =FALSE)# savingsaveRDS(morph_rpt_reduced, here("Rdata", "morph_rpt_reduced.rds"))
Code
# reading in Rdatamorph_rpt_reduced <-readRDS(here("Rdata", "morph_rpt_reduced.rds"))morph_rpt_reduced## ## ## Variance estimation using the glmm method and logit link ## ## Variance of Population## --------------------------------## Link-scale approximation:## Var = 0.752## SE = 0.345## CI = [0.186, 1.548]## P = 4.48e-30 [LRT]## NA [Permutation]## ## Variance of Residual## --------------------------------## Link-scale approximation:## Var = 4.107## SE = 0.181## CI = [4.001, 4.636]## P = NA [LRT]## NA [Permutation]
7.4 Repeatability calculation
Our target clustering variable is Population, and we want to estimate repeatability (R) as the ratio of the variance of the target clustering variable (Population) to the total variance.
Here we have two kinds of effects: 1) effects which we want to remove (adjust) from repeatability calculation (Container) and 2) effects which we want to retain (Moistureand Population).
Code
# this is mixture of Rv1 and Rv3# point estimate(morph_rpt_full$R[2, "Population"] + morph_rpt_full$R[2, "Fixed"])/ (morph_rpt_full$R[2, "Population"] + morph_rpt_full$R[2, "Fixed"] + morph_rpt_full$R[2, "Residual"])## [1] 0.2113642# 95% CI using bootstrappingCI <- (morph_rpt_full$R_boot_link$Population + morph_rpt_full$R_boot_link$Fixed) / (morph_rpt_full$R_boot_link$Population + morph_rpt_full$R_boot_link$Fixed + morph_rpt_full$R_boot_link$Residual)quantile(CI, c(0.025, 0.975))## 2.5% 97.5% ## 0.09260895 0.32533574# Instead of using `Residual`, we could use pi^2/3, which is theoretical distribution-specific variance for family = binomial(logit)# point estimate(morph_rpt_full$R[2, "Population"] + morph_rpt_full$R[2, "Fixed"])/ (morph_rpt_full$R[2, "Population"] + morph_rpt_full$R[2, "Population"] + (pi^2)/3 )## [1] 0.2242083# 95% CI using bootstrappingCI <- (morph_rpt_full$R_boot_link$Population + morph_rpt_full$R_boot_link$Fixed) / (morph_rpt_full$R_boot_link$Population + morph_rpt_full$R_boot_link$Fixed + (pi^2)/3 )quantile(CI, c(0.025, 0.975))## 2.5% 97.5% ## 0.116277 0.377135
Code
# ordinary repeatability# point estimatemorph_rpt_reduced$R[2, "Population"] / (morph_rpt_reduced$R[2, "Population"] + morph_rpt_reduced$R[2, "Residual"])## [1] 0.1548503# 95% CI using bootstrappingCI <- morph_rpt_reduced$R_boot_link$Population / (morph_rpt_reduced$R_boot_link$Population + morph_rpt_reduced$R_boot_link$Residual)quantile(CI, c(0.025, 0.975))## 2.5% 97.5% ## 0.04342031 0.26678571# Instead of using `Residual`, we could use pi^2/3, which is theoretical distribution-specific variance for family = binomial(logit)# point estimatemorph_rpt_reduced$R[2, "Population"] / (morph_rpt_reduced$R[2, "Population"] + (pi^2)/3 )## [1] 0.186137# 95% CI using bootstrappingCI <- morph_rpt_reduced$R_boot_link$Population / (morph_rpt_reduced$R_boot_link$Population + (pi^2)/3 )quantile(CI, c(0.025, 0.975))## 2.5% 97.5% ## 0.05346076 0.32003024
8 The number of eggs
The number of eggs laid by each female is Poisson-distributed.
8.1 Data generation
Code
# Setting the seedset.seed(7777)# Subset the design matrix (only females lay eggs)datf <- dat1[dat1$Sex =="Female", ]# Simulate random effects for PopulationsPopulation_intercept <-rnorm(n_populations, 0, sqrt(0.4))# Simulate random effect for containers (random intercept)Container_intercept <-rnorm(n_containers, 0, sqrt(0.05))# generation of response values on latent scale (!) based on fixed effects,# random effects and residual errorsEggL <-0.1+ (0.4) * datf$Moisture +# fixed effect of Moisture Population_intercept[datf$Population] +# random intercept: Population Container_intercept[datf$Container] +# random intercept: Containerrnorm(n_individuals/2, 0, sqrt(1)) # data generation (on data scale!) based on Poisson distributiondatf$Egg <-rpois(length(EggL), exp(EggL))
8.2 Model fitting
One note here is that we have to model overdispersion by fitting observation-level random effects (OLREs) for Individual in addition to the random intercepts for Container and Population. See the following papers for overdispersion and OLREs
Harrison, X. A. (2014). Using observation-level random effects to model overdispersion in count data in ecology and evolution. PeerJ, 2, e616.
Nakagawa, S., & Schielzeth, H. (2010). Repeatability for Gaussian and non‐Gaussian data: a practical guide for biologists. Biological Reviews, 85(4), 935-956.
We use glmer function in lme4 to fix a ‘full’ model, which completely match with the data generating process. The model includes a fixed effect of Moisture, and random intercepts for Container and Population along with random intercepts for Individual, which model overdispersion.
Code
# fitting a mixed model# Note ``egg_full <-glmer(Egg ~1+ Moisture + (1|Container) + (1|Population) + (1|Individual), data = datf, family = poisson)summary(egg_full)## Generalized linear mixed model fit by maximum likelihood (Laplace## Approximation) [glmerMod]## Family: poisson ( log )## Formula: Egg ~ 1 + Moisture + (1 | Container) + (1 | Population) + (1 | ## Individual)## Data: datf## ## AIC BIC logLik -2*log(L) df.resid ## 5512.1 5537.1 -2751.1 5502.1 1075 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -1.25085 -0.55220 -0.09204 0.20005 1.05208 ## ## Random effects:## Groups Name Variance Std.Dev.## Individual (Intercept) 1.06570 1.0323 ## Container (Intercept) 0.08617 0.2935 ## Population (Intercept) 0.41207 0.6419 ## Number of obs: 1080, groups: Individual, 1080; Container, 36; Population, 12## ## Fixed effects:## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.37155 0.21088 1.762 0.0781 . ## Moisture 0.47777 0.07573 6.309 2.81e-10 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Correlation of Fixed Effects:## (Intr)## Moisture -0.372
We can also fit a reduced model, which fit our target clustering variable, Population as well as Individual to account for overdispersion.
Code
egg_reduced <-glmer(Egg ~1+ (1|Population) + (1|Individual), data = datf, family = poisson)summary(egg_reduced)## Generalized linear mixed model fit by maximum likelihood (Laplace## Approximation) [glmerMod]## Family: poisson ( log )## Formula: Egg ~ 1 + (1 | Population) + (1 | Individual)## Data: datf## ## AIC BIC logLik -2*log(L) df.resid ## 5623.3 5638.2 -2808.6 5617.3 1077 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -1.12131 -0.54957 -0.04214 0.20807 0.73381 ## ## Random effects:## Groups Name Variance Std.Dev.## Individual (Intercept) 1.2531 1.1194 ## Population (Intercept) 0.4373 0.6613 ## Number of obs: 1080, groups: Individual, 1080; Population, 12## ## Fixed effects:## Estimate Std. Error z value Pr(>|z|) ## (Intercept) 0.8543 0.1955 4.371 1.24e-05 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
8.3 Variance decomposition
We now want to get variance for each fixed and random effect, including residuals. We can use just rptR as we only have one fixed effect so fixed effects do not need to be decomposed further.
# the same model as `egg_full` yet we do not need to fit `Individual` as OLREsegg_rpt_full <-rpt(Egg ~1+ Moisture + (1| Container) + (1| Population), grname =c("Container", "Population", "Fixed", "Residual"), data = datf, datatype ="Poisson",nboot =1000, npermut =0, ratio =FALSE)# savingsaveRDS(egg_rpt_full, here("Rdata", "egg_rpt_full.rds"))
Code
# reading in Rdataegg_rpt_full <-readRDS(here("Rdata", "egg_rpt_full.rds"))egg_rpt_full## ## ## Variance estimation using the glmm method and log link ## ## Variance of Container## --------------------------------## Link-scale approximation:## Var = 0.086## SE = 0.039## CI = [0.013, 0.168]## P = 2.95e-05 [LRT]## NA [Permutation]## ## Variance of Population## --------------------------------## Link-scale approximation:## Var = 0.412## SE = 0.18## CI = [0.077, 0.762]## P = 1.6e-06 [LRT]## NA [Permutation]## ## Variance of Residual## --------------------------------## Link-scale approximation:## Var = 1.235## SE = 0.073## CI = [1.065, 1.353]## P = NA [LRT]## NA [Permutation]## ## Variance of Fixed## --------------------------------## Link-scale approximation:## Var = 0.152## SE = 0.05## CI = [0.071, 0.264]## P = NA [LRT]## NA [Permutation]
Code
# the same model as `size_full`egg_rpt_reduced <-rpt(Egg ~1+ (1|Population),grname =c("Population", "Residual"), data = datf, datatype ="Poisson",nboot =1000, npermut =0, ratio =FALSE)# savingsaveRDS(egg_rpt_reduced, here("Rdata", "egg_rpt_reduced.rds"))
Code
# reading in Rdataegg_rpt_reduced <-readRDS(here("Rdata", "egg_rpt_reduced.rds"))egg_rpt_reduced## ## ## Variance estimation using the glmm method and log link ## ## Variance of Population## --------------------------------## Link-scale approximation:## Var = 0.437## SE = 0.183## CI = [0.105, 0.82]## P = 7.12e-40 [LRT]## NA [Permutation]## ## Variance of Residual## --------------------------------## Link-scale approximation:## Var = 1.423## SE = 0.079## CI = [1.232, 1.549]## P = NA [LRT]## NA [Permutation]
8.4 Repeatability calculation
Our target clustering variable is Population, and we want to estimate repeatability (R) as the ratio of the variance of the target clustering variable (Population) to the total variance.
Here we have two kinds of effects: 1) effects which we want to remove (adjust) from repeatability calculation (Container) and 2) effects which we want to retain (Moistureand Population; Note Individual would be included in Residual).
Code
# this is mixture of Rv1 and Rv3# point estimate(egg_rpt_full$R[2, "Population"] + egg_rpt_full$R[2, "Fixed"])/ (egg_rpt_full$R[2, "Population"] + egg_rpt_full$R[2, "Fixed"] + egg_rpt_full$R[2, "Residual"])## [1] 0.3136216# 95% CI using bootstrappingCI <- (egg_rpt_full$R_boot_link$Population + egg_rpt_full$R_boot_link$Fixed) / (egg_rpt_full$R_boot_link$Population + egg_rpt_full$R_boot_link$Fixed + egg_rpt_full$R_boot_link$Residual)quantile(CI, c(0.025, 0.975))## 2.5% 97.5% ## 0.1540324 0.4429450
The formula for conditional repeatability is (see the paper for more details):
\[
R_{conditional} = \frac{\sigma_{u_{0}}^2 + \sigma_{u_{1}}^2 x_{1^*} + 2\rho\sigma_{u_{0}}\sigma_{u_{1}} x_{1^*}}{\sigma_{u_{0}}^2 + \sigma_{u_{1}}^2 x_{1^*} + 2\rho\sigma_{u_{0}}\sigma_{u_{1}} x_{1^*} + \sigma^2_\epsilon}
\] Here we will use the model we used earlier size_full (the Body length model). Also, for simplicity, we will focus on Population and Residual, ignoring all the fixed effects and container (i.e, we calculate ‘adjusted’ repeatability). In the above formula, the variance for Population is \(\sigma_{u_{0}}^2\) while the variance for Residual is \(\sigma^2_\epsilon\). The variance for Moisture is \(\sigma_{u_{1}}^2\) and the covariance between Population and Moisture is \(\sigma^2_\epsilon\).
This is the model output
Code
summary(size_full)## Linear mixed model fit by REML. t-tests use Satterthwaite's method [## lmerModLmerTest]## Formula: BodyL ~ 1 + Sex + Moisture + Student + (1 | Container) + (1 + ## Moisture | Population)## Data: dat1## ## REML criterion at convergence: 6230.9## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -3.1363 -0.6899 0.0214 0.6819 3.3819 ## ## Random effects:## Groups Name Variance Std.Dev. Corr ## Container (Intercept) 0.08078 0.2842 ## Population (Intercept) 3.04570 1.7452 ## Moisture 0.10063 0.3172 -0.56## Residual 0.97053 0.9852 ## Number of obs: 2160, groups: Container, 72; Population, 12## ## Fixed effects:## Estimate Std. Error df t value Pr(>|t|) ## (Intercept) 14.59595 0.66531 11.57950 21.938 8.58e-11 ***## SexMale -2.89866 0.07928 47.00097 -36.563 < 2e-16 ***## Moisture 0.46650 0.10365 10.99947 4.501 0.0009 ***## StudentB 1.39444 0.85635 9.99965 1.628 0.1345 ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Correlation of Fixed Effects:## (Intr) SexMal Moistr## SexMale -0.060 ## Moisture -0.410 0.000 ## StudentB -0.644 0.000 0.000
We have Moisture with 0, 1 and 2 and we can get 3 repeatability estimates using the formula above.
Code
sigma2_pop_int <-VarCorr(size_full)$Population[1, 1] # variance for Population interceptsigma2_pop_slope <-VarCorr(size_full)$Population[2, 2] # variance for Population slope sigma_pop_cov <-VarCorr(size_full)$Population[1, 2] # covariance between Population intercept and slopesigma2_residual <-attr(VarCorr(size_full), "sc")^2# residual variance# repeatability for `Moisture` = 0R_cond_0 <- (sigma2_pop_int + sigma2_pop_slope *0+2* sigma_pop_cov *0) / (sigma2_pop_int + sigma2_pop_slope *0+2* sigma_pop_cov *0+ sigma2_residual)# repeatability for `Moisture` = 1R_cond_1 <- (sigma2_pop_int + sigma2_pop_slope *1+2* sigma_pop_cov *1) / (sigma2_pop_int + sigma2_pop_slope *1+2* sigma_pop_cov *1+ sigma2_residual)# repeatability for `Moisture` = 2R_cond_2 <- (sigma2_pop_int + sigma2_pop_slope *2+2* sigma_pop_cov *2) / (sigma2_pop_int + sigma2_pop_slope *2+2* sigma_pop_cov *2+ sigma2_residual)# printing outR_cond_0## [1] 0.7583483R_cond_1## [1] 0.7223008R_cond_2## [1] 0.6736132
10 Bonus 2: Creating the same dataset using squidSim
Code
# installing and loading squidSim - https://github.com/squidgroup/squidSim#devtools::install_github("squidgroup/squidSim")library(squidSim)library(lme4)## First Example### DATA STRUCTURE## Two students, each do 6 populations (12 in total), each with 30 individuals, in each combination of the two sexes and three moisture treatments (giving unique individuals 2160 in total). The unique combination of Population, Sex and Moisture is the Container (72 in total) and the unique combination of ID within a container, Sex and Moisture is the Individual.ds1 <-make_structure("Student(2)/Population(6)/ID(30) + Sex(2)+ Moisture(3) + Population:Sex:Moisture + ID:Sex:Moisture",repeat_obs=1,level_names=list(Student=c("A","B"), Sex=c("Female","Male"),Moisture=0:2 ),int_names =c("Container","Individual") )head(ds1)### DATA GENERATIONfull_sim1 <-simulate_population(seed=22376,data_structure = ds1,response_name="BodyL",parameters =list(intercept=15,Population =list(names =c("Population_intercept","Population_slope"),vcov =c(1.3,0.1),beta=c(1,0) # slopes not included in main additive part ),Container =list(names=c("Container_intercept"),vcov=0.1 ),Sex =list(fixed=TRUE, # tells squidSim it is a factornames =c("Female","Male"),beta =c(0,-3) # Female is baseline, then difference between baseline Female and Male ),Student =list(fixed=TRUE, # tells squidSim it is a factornames =c("A","B"),beta =c(0,0.05) # A is baseline, then difference between baseline A and B ),Moisture =list(covariate=TRUE, # tells squidSim it is continuous (not a grouping factor)names =c("Moisture_cont"),beta =c(0.4) ),interactions =list( names ="Population_slope:Moisture_cont", #random slopesbeta=1 ),residual=list(vcov =1 ) ))dat1 <-get_population_data(full_sim1)size_full <-lmer(BodyL ~1+ Sex + Moisture + Student + (1|Container) + (1+ Moisture|Population),data = dat1)summary(size_full)## EXAMPLE 2 - Activity levelds2 <-make_structure("Population(12)/ID(30) + Sex(2)+ Moisture(3) + Population:Sex:Moisture + ID:Sex:Moisture",repeat_obs=3, ## now with 3 obs per individuallevel_names=list(Sex=c("Female","Male"),Moisture=0:2 ),int_names =c("Container","Individual") )full_sim2 <-simulate_population(seed=4242,data_structure = ds2,response_name="Activity",link="log",parameters =list(intercept=0.1,Population =list(names =c("Population_intercept","Population_slope"),vcov =c(1.3,0.1),beta=c(1,0) # slopes not included in main additive part ),Container =list(vcov=0.1 ),Individual =list(vcov=2 ),Sex =list(fixed=TRUE, # tells squidSim it is a factornames =c("Female","Male"),beta =c(0,1.2) # Female is baseline, then difference between baseline Female and Male ),Moisture =list(covariate=TRUE, # tells squidSim it is continuous (not a grouping factor)names =c("Moisture_cont"),beta =c(0.3) ),interactions =list( names ="Population_slope:Moisture_cont", #random slopesbeta=1 ),residual=list(vcov =0.5 ) ))dat2 <-get_population_data(full_sim2)act_full <-lmer(log(Activity) ~1+ Sex + Moisture + (1|Container) + (1+ Moisture|Population) + (1| Individual), data = dat2)summary(act_full)## example 3 - Male Morphds_m <- ds1[ds1$Sex =="Male", ]full_sim3 <-simulate_population(seed=125,data_structure = ds_m,response_name="Colour",link="logit",family="binomial",parameters =list(intercept=0.8,Population =list(vcov =1.2 ),Container =list(vcov=0.1 ),Moisture =list(covariate=TRUE, # tells squidSim it is continuous (not a grouping factor)beta =c(-0.6) ),residual=list(vcov =0 ) ))datm <-get_population_data(full_sim3)morph_full <-glmer(Colour ~1+ Moisture + (1|Container) + (1|Population),family =binomial(link ="logit"),data = datm)summary(morph_full)## example 4 - Egg numberds_f <- ds1[ds1$Sex =="Female", ]full_sim4 <-simulate_population(seed=125,data_structure = ds_f,response_name="Egg",link="log",family="poisson",parameters =list(intercept=0.1,Population =list(vcov =0.4 ),Container =list(vcov=0.05 ),Moisture =list(covariate=TRUE, # tells squidSim it is continuous (not a grouping factor)beta =c(0.4) ),residual=list(vcov =1 ) ))datf <-get_population_data(full_sim4)egg_full <-glmer(Egg ~1+ Moisture + (1|Container) + (1|Population) + (1|Individual), data = datf, family = poisson)summary(egg_full)
11 Software and package versions
Code
sessioninfo::session_info() %>% details::details(summary ='Current session info',open = F ) %>%pander()## Warning in pander.default(.): No pander.method for "details_console", reverting## to default.
---title: "ONLINE SUPPLEMENT: Understanding different types of repeatability and intra-class correlation for an analysis of biological variation"author: "**Shinichi Nakagawa, David F. Westneat, Ayumi Mizuno, Yimen G. Araya-Ajoy, Barbara Class, Niels Dingemanse, Ned A. Dochtermann, Malgorzata Lagisz, Kate L. Laskowski, Joel L. Pick, Denis Réale, Coralie Williams, Jonathan Wright and Holger Schielzeth**"format: html: toc: true toc-location: left toc-depth: 3 toc-title: "**Table of Contents**" output-file: "index.html" theme: simplex embed-resources: true code-fold: show code-tools: true number-sections: true #bibliography: ./bib/ref.bib fontsize: "12" max-width: "10" code-overflow: wrapcrossref: fig-title: Figure # (default is "Figure") tbl-title: Table # (default is "Table") title-delim: — # (default is ":") fig-prefix: Fig. # (default is "Figure") tbl-prefix: Tab. # (default is "Table")editor_options: chunk_output_type: consoleeditor: markdown: wrap: sentence---```{r setup}#| include: falseknitr::opts_chunk$set(collapse =TRUE,message =FALSE,warnings =FALSE,echo = TRUE#,#comment = "#>")```# IntroductionThis online material is a supplement to our paper "Understanding different types of repeatabilty and intra-class correlation for an analysis of biological variation". We will show how to obtain different types of repeatability (intra-class correlation) using the `lme4`, `rptR` and `partR2` packages in `R`.# ContentsIn this online material, we will illustrate how to get repeatabilty for four traits: 1) body length of adult beetles (Gaussian distribution, *N* = 2,160), 2) frequencies of the two distinct male colour morphs (binomial or Bernoulli distribution, *N* = 1,080), 3) the number of eggs laid by each female (Poisson distribution) after random mating (*N* = 1,080) and 4) the activity levels (log-normally distributed) with three observations per individual (*N* = 6,480). After generating these data, we will:- Use `lme4` to fit a full model, which match the data generating process, and a reduced model, which only fit the target clustering variable (e.g., `Population`).- Use `rptR` and `partR2` to obtain variance component for each fixed and random effect including residuals. - By using estimated variance components, estimate different types of repeatability (e.g., adjusted, unadjusted and marginal repeatability)In addition, we show how to obtain conditional repeatability for body length at different levels of moisture (0, 1 and 2). Also, importantly, both `rptR` and `partR2` packages have vignettes that provide detailed information on how to use them and associated publications.The `rptR` package is available on CRAN and the vignette can be found [here](https://cran.r-project.org/web/packages/rptR/vignettes/rptR.html).> Stoffel, M. A., Nakagawa, S., & Schielzeth, H. (2017). rptR: Repeatability estimation and variance decomposition by generalized linear mixed‐effects models. Methods in ecology and evolution, 8(11), 1639-1644.The `partR2` package is available on CRAN and the vignette can be found [here](https://cran.r-project.org/web/packages/partR2/vignettes/Using_partR2.html).> Stoffel, M. A., Nakagawa, S., & Schielzeth, H. (2021). partR2: Partitioning R2 in generalized linear mixed models. PeerJ, 9, e11414.# Prerequisites## Loading packagesOur tutorial uses `R` statistical software and existing `R` packages, which you will first need to download and install.If the packages are archived in CRAN, use `install.packages()` to install them.For example, to install the `lme4` , you can execute `install.packages("lme4")` in the console (bottom left pane of `R Studio`).Version information of each package is listed at the end of this tutorial.```{r packages}# clearing memoryrm(list =ls())# attempt to load or install necessary packagesif(!require(pacman)) install.packages("pacman")pacman::p_load(lme4, lmerTest, rptR, partR2, pander, here, sessioninfo, details)```# Basic data and preparationHere we create identifiers for `Population`, `Container`, `Individual`, `Sex`, `Moisture` and `Student` for 12 populations, 72 containers, 30 individuals per container, and 3 conditions (moisture levels). The data are generated at the individual level and then expanded to the observation level with three observations per individual.```{r}# number of thingsn_populations <-12# number of populationsn_containers <-72# number of containersn_samples <-30# number of samples per containern_individuals <-2160# number of individuals# Data at the individual level# 12 different populations Population <-gl(n_populations, n_samples*2*3, n_individuals) # 2 sex and 3 conditions# 120 containers (30 individuals in each container)Container <-gl(n_containers, n_samples, n_individuals)# Individual IDIndividual <-factor(1:n_individuals)# Sex of the individuals. Uni-sex within each container (individuals are sorted at the pupa stage)Sex <-factor(rep(c(rep("Female", n_samples), rep("Male", n_samples)), n_containers/2))# Observer (person A and B)Student <-factor(rep(c("A", "B"), each = n_individuals/2))# Moisture levelsMoisture <-rep(rep(0:2, each=60), n_populations) # 3 levels of moisture (1, 2, 3)# the first basic data set with matrix of 2160 x 6dat1 <-data.frame(Population = Population, Individual = Individual, Container = Container, Sex = Sex, Moisture = Moisture, Student = Student)# Data at the observation level#Individual <- factor(rep(1:n_individuals, 3)) # 3 observations per individual# Basic data set with a matrix of 6480 x 5 dat2 <-rbind(dat1, dat1, dat1) # 3 observations per individualdat2 <- dat2[, -6] # taking out student info (as activity level was measured by a machine)```# Body lengh Body length is a continuous trait that is normally distributed.## Data generation```{r}# Setting the seedset.seed(22376)# Simulate random effects for PopulationsPopulation_intercept <-rnorm(n_populations, 0, sqrt(1.3)) # variance of 1.3Population_slope <-rnorm(n_populations, 0, sqrt(0.1)) # variance of 0.1# Simulate random effect for containers (random intercept)Container_intercept <-rnorm(n_containers, 0, sqrt(0.1)) # variance of 0.1# Generate outcome (BodyL) with both random intercept and random slope for Moisture per populationdat1$BodyL <-15-3* (as.numeric(dat1$Sex) -1) +0.4* dat1$Moisture +# fixed effect of Moisture0.05* (as.numeric(dat1$Student) -1) +# fixed effect of Student Population_intercept[dat1$Population] +# random intercept: Population Population_slope[dat1$Population] * dat1$Moisture +# random slope: Population x Moisture Container_intercept[dat1$Container] +# random intercept: Containerrnorm(n_individuals, 0, sqrt(1)) # residual (variance of 1)```## Model fitting::: panel-tabset### Full modelWe use the `lmer` function in `lme4` to fix a 'full' model, which completely match with the data generating process. The model includes fixed effects of `Sex`, `Moisture` and `Student`, and random intercepts for `Container` and random slopes for `Moisture` by `Population`.Our target clustering (grouping) variable is `Population`, for which we want to estimate repeatability.```{r}# fitting a mixed modelsize_full <-lmer(BodyL ~1+ Sex + Moisture + Student + (1|Container) + (1+ Moisture|Population),data = dat1)summary(size_full)```### Reduced modelWe can also fit a reduced model, which only fit our target clustering variable, `Population`.```{r}# fitting a reduced modelsize_reduced <-lmer(BodyL ~1+ (1|Population), data = dat1)summary(size_reduced)```:::## Variance decompositionWe now want to get variance for each fixed and random effect, including residuals. We can use the `rptR` and `partR2` package to do this.::: panel-tabset### Full model```{r}#| eval: false# the same model as `size_full`size_rpt_full <-rpt(BodyL ~1+ Student + Sex + Moisture + (1| Container) + (1+ Moisture| Population), grname =c("Container", "Population", "Fixed", "Residual"), data = dat1, datatype ="Gaussian",nboot =1000, npermut =0, ratio =FALSE)# savingsaveRDS(size_rpt_full, here("Rdata", "size_rpt_full.rds"))# partR2 cannot process a random slope model so take out# we are assuming that fixed variances are teh same with or without the random slopesize_full2 <-lmer(BodyL ~1+ Sex + Moisture + Student + (1|Container) + (1|Population),data = dat1)# No random slopessize_part <-partR2(size_full2, partvars =c("Moisture", "Sex", "Student"),nboot =NULL)# savingsaveRDS(size_part, here("Rdata", "size_part.rds"))``````{r}# reading in Rdatasize_rpt_full <-readRDS(here("Rdata", "size_rpt_full.rds"))size_part <-readRDS(here("Rdata", "size_part.rds"))size_rpt_fullsize_part```### Reduced model```{r}#| eval: false# the same model as `size_full`size_rpt_reduced <-rpt(BodyL ~1+ (1|Population),grname =c("Population", "Residual"), data = dat1, datatype ="Gaussian",nboot =1000, npermut =0, ratio =FALSE)# savingsaveRDS(size_rpt_reduced, here("Rdata", "size_rpt_reduced.rds"))``````{r}# reading in Rdatasize_rpt_reduced <-readRDS(here("Rdata", "size_rpt_reduced.rds"))size_rpt_reduced```:::## Repeatability calculationOur target clustering variable is `Population`, and we want to estimate repeatability (R) as the ratio of the variance of the target clustering variable (Population) to the total variance. ::: panel-tabset### Full modelHere we have two kinds of effects: 1) effects which we want to remove (adjust) from repeatability calculation (`Student` and `Container`) and 2) effects which we want to retain (`Sex`, `Moisture`and `Population`).```{r}# this is mixture of Rv1, Rv2 and Rv3# also, this repeatability is marginalized in relation to `Moisture`# getting adjusted fixed effect variance.# getting % of `Sex` and `Moisture in relation to the total fixed effect varianceadj_factor <- size_part$R2[5 ,"estimate"][[1]]/ size_part$R2[8 ,"estimate"][[1]]# getting % of # variance for `Sex`, `Moisture'Fixed_point <- size_rpt_full$R[1, "Fixed"] * adj_factor Fixed_boot <- size_rpt_full$R_boot[, "Fixed"] * adj_factor# getting % of 'Moisture' to the total fixed effect varianceadj_factor2 <- size_part$R2[2 ,"estimate"][[1]]/ size_part$R2[8 ,"estimate"][[1]]# variance for `Moisture'Fixed_point2 <- size_rpt_full$R[1, "Fixed"] * adj_factor2 Fixed_boot2 <- size_rpt_full$R_boot[, "Fixed"] * adj_factor2# point estimate(size_rpt_full$R[1, "Population"] + Fixed_point)/ (size_rpt_full$R[1, "Population"] + Fixed_point + size_rpt_full$R[1, "Residual"])# 95% CI using bootstrappingCI <- (size_rpt_full$R_boot[, "Population"] + Fixed_boot) / (size_rpt_full$R_boot[, "Population"] + Fixed_boot + size_rpt_full$R_boot[, "Residual"])quantile(CI, c(0.025, 0.975))# repeatability - now removing `Sex` effect from numerator but not denominator# point estimate(size_rpt_full$R[1, "Population"] + Fixed_point2)/ (size_rpt_full$R[1, "Population"] + Fixed_point + size_rpt_full$R[1, "Residual"])# 95% CI using bootstrappingCI <- (size_rpt_full$R_boot[, "Population"] + Fixed_boot2) / (size_rpt_full$R_boot[, "Population"] + Fixed_boot + size_rpt_full$R_boot[, "Residual"])quantile(CI, c(0.025, 0.975))# Extra - variance component for `Moisture` `Sex` and `Student`.# SexFixed_point - Fixed_point2# MoistureFixed_point2# Studentsize_rpt_full$R[1, "Fixed"] - Fixed_point```### Reduced model```{r}# ordinary repeatability# point estimatesize_rpt_reduced$R[1, "Population"] / (size_rpt_reduced$R[1, "Population"] + size_rpt_reduced$R[1, "Residual"])# 95% CI using bootstrappingCI <- size_rpt_reduced$R_boot[, "Population"] / (size_rpt_reduced$R_boot[, "Population"] + size_rpt_reduced$R_boot[, "Residual"])quantile(CI, c(0.025, 0.975))```:::# Activity levelActivity level (log-transformed) is a continuous trait (i.e., mm traveled) that is log-normally distributed. We have three observations per individual.## Data generation```{r}# Setting the seedset.seed(4242)# Simulate random effects for PopulationsPopulation_intercept <-rnorm(n_populations, 0, sqrt(1.3))Population_slope <-rnorm(n_populations, 0, sqrt(0.1)) # random slope variance (adjustable)# Simulate random effect for containers (random intercept)Container_intercept <-rnorm(n_containers, 0, sqrt(0.1))# Individual interceptIndividual_intercept <-rnorm(n_individuals, 0, sqrt(2))dat2$Activity <-exp(0.1+1.2* (as.numeric(dat2$Sex) -1) +0.3* dat2$Moisture +# fixed effect of Moisture Population_intercept[dat2$Population] +# random intercept: Population Population_slope[dat2$Population] * dat2$Moisture +# random slope: Population x Moisture Container_intercept[dat2$Container] +# random intercept: Container Individual_intercept[dat2$Individual] +# random intercept: Individualrnorm(n_individuals*3, 0, sqrt(0.5)) ) # residual```## Model fitting::: panel-tabset### Full modelWe use `lmer` function in `lme4` to fix a 'full' model, which completely match with the data generating process. The model includes fixed effects of `Sex`, `Moisture`, and random intercepts for `Container` and random slopes for `Moisture` by `Population`, and random intercepts for `Individual`.```{r}# fitting a mixed modelact_full <-lmer(log(Activity) ~1+ Sex + Moisture + (1|Container) + (1+ Moisture|Population) + (1| Individual), data = dat2)summary(act_full)```### Reduced modelWe can also fit a reduced model, which only fit our target clustering variable, `Indiviudal`.```{r}# fitting a reduced modelact_reduced <-lmer(log(Activity) ~1+ (1|Individual), data = dat2)summary(act_reduced)```:::## Variance decompositionWe now want to get variance for each fixed and random effect, including residuals. We can use the `rptR` and `partR2` package to do this.::: panel-tabset### Full model```{r}#| eval: false# the same model as `size_full`act_rpt_full <-rpt(log(Activity) ~1+ Sex + Moisture + (1|Container) + (1+ Moisture|Population) + (1| Individual), grname =c("Container", "Population", "Individual", "Fixed", "Residual"), data = dat2, datatype ="Gaussian",nboot =1000, npermut =1000, ratio =FALSE)# savingsaveRDS(act_rpt_full, here("Rdata", "act_rpt_full.rds"))# partR2 cannot process a random slope model so take outact_full2 <-lmer(log(Activity) ~1+ Sex + Moisture + (1|Container) + (1|Population) + (1| Individual), data = dat2)# No random slopes# we are assuming that fixed variances are teh same with or without the random slopeact_part <-partR2(act_full2, partvars =c("Moisture", "Sex"),nboot =NULL)# savingsaveRDS(act_part, here("Rdata", "act_part.rds"))``````{r}# reading in Rdataact_rpt_full <-readRDS(here("Rdata", "act_rpt_full.rds"))act_part <-readRDS(here("Rdata", "act_part.rds"))act_rpt_fullact_part```### Reduced model```{r}#| eval: false# the same model as `size_full`act_rpt_reduced <-rpt(log(Activity) ~1+ (1| Individual), grname =c("Individual", "Residual"), data = dat2, datatype ="Gaussian",nboot =1000, npermut =0,ratio =FALSE)# savingsaveRDS(act_rpt_reduced, here("Rdata", "act_rpt_reduced.rds"))``````{r}# reading in Rdataact_rpt_reduced <-readRDS(here("Rdata", "act_rpt_reduced.rds"))act_rpt_reduced```:::## Repeatability calculationOur target clustering variable is `individual`, and we want to estimate repeatability (R) as the ratio of the variance of the target clustering variable (individual) to the total variance. ::: panel-tabset### Full modelHere we have two kinds of effects: 1) effects which we want to remove (adjust) from repeatability calculation (`Container`) and 2) effects which we want to retain (`Sex`, `Moisture`, `Population` and `individual`).```{r}# this is mixture of Rv1, Rv2 and Rv3# also, this repeatability is marginalized in relation to `Moisture`# getting adjusted fixed effect variance.# getting % of 'Moisture' to the total fixed effect varianceadj_factor3 <- act_part$R2[2 ,"estimate"][[1]]/ act_part$R2[4 ,"estimate"][[1]]# variance for `Moisture'Fixed_point3 <- act_rpt_full$R[1, "Fixed"] * adj_factor3 Fixed_boot3 <- act_rpt_full$R_boot[, "Fixed"] * adj_factor3# point estimate(act_rpt_full$R[1, "Individual"] + act_rpt_full$R[1, "Population"] + act_rpt_full$R[1, "Fixed"])/ (act_rpt_full$R[1, "Individual"] + act_rpt_full$R[1, "Population"]+ act_rpt_full$R[1, "Fixed"] + act_rpt_full$R[1, "Residual"])# 95% CI using bootstrappingCI <- (act_rpt_full$R_boot[, "Individual"] + act_rpt_full$R_boot[, "Population"] + act_rpt_full$R_boot[, "Fixed"])/ (act_rpt_full$R_boot[, "Individual"] + act_rpt_full$R_boot[, "Population"] + act_rpt_full$R_boot[, "Fixed"] + act_rpt_full$R_boot[, "Residual"])quantile(CI, c(0.025, 0.975))# repeatability - now removing `Sex` effect from numerator but not denominator# point estimate# point estimate(act_rpt_full$R[1, "Individual"] + act_rpt_full$R[1, "Population"] + Fixed_point3)/ (act_rpt_full$R[1, "Individual"] + act_rpt_full$R[1, "Population"]+ act_rpt_full$R[1, "Fixed"] + act_rpt_full$R[1, "Residual"])# 95% CI using bootstrappingCI <- (act_rpt_full$R_boot[, "Individual"] + act_rpt_full$R_boot[, "Population"] + Fixed_boot3)/ (act_rpt_full$R_boot[, "Individual"] + act_rpt_full$R_boot[, "Population"] + act_rpt_full$R_boot[, "Fixed"] + act_rpt_full$R_boot[, "Residual"])quantile(CI, c(0.025, 0.975))# Extra - variance component for `Moisture` and `Sex`. # MoistureFixed_point3# Sexact_rpt_full$R[1, "Fixed"] - Fixed_point3```### Reduced model```{r}# ordinary repeatability# point estimateact_rpt_reduced$R[1, "Individual"] / (act_rpt_reduced$R[1, "Individual"] + act_rpt_reduced$R[1, "Residual"])# 95% CI using bootstrappingCI <- act_rpt_reduced$R_boot[, "Individual"] / (act_rpt_reduced$R_boot[, "Individual"] + act_rpt_reduced$R_boot[, "Residual"])quantile(CI, c(0.025, 0.975))```:::# Male morphThe frequencies of the two distinct male colour morphs: black (0) or reddish (1) is a binomial or Bernoulli trait.## Data generation```{r}# Setting the seedset.seed(123)# Subset the design matrix (only males express colour morphs)datm <- dat1[dat1$Sex =="Male", ]# Simulate random effects for PopulationsPopulation_intercept <-rnorm(n_populations, 0, sqrt(1.2))# Simulate random effect for containers (random intercept)Container_intercept <-rnorm(n_containers, 0, sqrt(0.1))# generation of response values on latent scale (!) based on fixed effects,# random effects and residual errorsColourL <-0.8+ (-0.6) * datm$Moisture +# fixed effect of Moisture Population_intercept[datm$Population] +# random intercept: Population Container_intercept[datm$Container] # random intercept: Container# data generation (on data scale!) based on binomial distributiondatm$Colour <-rbinom(length(ColourL), 1, plogis(ColourL))```## Model fitting::: panel-tabset### Full modelWe use `glmer` function in `lme4` to fix a 'full' model, which completely match with the data generating process. The model includes a fixed effect of `Moisture`, and random intercepts for `Container` and `Population`.```{r}# fitting a mixed modelmorph_full <-glmer(Colour ~1+ Moisture + (1|Container) + (1|Population),family =binomial(link ="logit"),data = datm)summary(morph_full)```### Reduced modelWe can also fit a reduced model, which only fit our target clustering variable, `Population`.```{r}# fitting a reduced modelmorph_reduced <-glmer(Colour ~1+ (1|Population), family =binomial(link ="logit"),data = datm)summary(morph_reduced)```:::## Variance decompositionWe now want to get variance for each fixed and random effect, including residuals. We can use just `rptR` as we only have one fixed effect so fixed effects do not need to be decomposed further. ::: panel-tabset### Full model```{r}#| eval: false# the same model as `size_full`morph_rpt_full <-rpt(Colour ~1+ Moisture + (1| Container) + (1| Population), grname =c("Container", "Population", "Fixed", "Residual"), data = datm, datatype ="Binary",nboot =1000, npermut =0, ratio =FALSE)# savingsaveRDS(morph_rpt_full, here("Rdata", "morph_rpt_full.rds"))``````{r}# reading in Rdatamorph_rpt_full <-readRDS(here("Rdata", "morph_rpt_full.rds"))morph_rpt_full```### Reduced model```{r}#| eval: false# the same model as `size_full`morph_rpt_reduced <-rpt(Colour ~1+ (1|Population),grname =c("Population", "Residual"), data = datm, datatype ="Binary",nboot =1000, npermut =0, ratio =FALSE)# savingsaveRDS(morph_rpt_reduced, here("Rdata", "morph_rpt_reduced.rds"))``````{r}# reading in Rdatamorph_rpt_reduced <-readRDS(here("Rdata", "morph_rpt_reduced.rds"))morph_rpt_reduced```:::## Repeatability calculationOur target clustering variable is `Population`, and we want to estimate repeatability (R) as the ratio of the variance of the target clustering variable (Population) to the total variance. ::: panel-tabset### Full modelHere we have two kinds of effects: 1) effects which we want to remove (adjust) from repeatability calculation (`Container`) and 2) effects which we want to retain (`Moisture`and `Population`).```{r}# this is mixture of Rv1 and Rv3# point estimate(morph_rpt_full$R[2, "Population"] + morph_rpt_full$R[2, "Fixed"])/ (morph_rpt_full$R[2, "Population"] + morph_rpt_full$R[2, "Fixed"] + morph_rpt_full$R[2, "Residual"])# 95% CI using bootstrappingCI <- (morph_rpt_full$R_boot_link$Population + morph_rpt_full$R_boot_link$Fixed) / (morph_rpt_full$R_boot_link$Population + morph_rpt_full$R_boot_link$Fixed + morph_rpt_full$R_boot_link$Residual)quantile(CI, c(0.025, 0.975))# Instead of using `Residual`, we could use pi^2/3, which is theoretical distribution-specific variance for family = binomial(logit)# point estimate(morph_rpt_full$R[2, "Population"] + morph_rpt_full$R[2, "Fixed"])/ (morph_rpt_full$R[2, "Population"] + morph_rpt_full$R[2, "Population"] + (pi^2)/3 )# 95% CI using bootstrappingCI <- (morph_rpt_full$R_boot_link$Population + morph_rpt_full$R_boot_link$Fixed) / (morph_rpt_full$R_boot_link$Population + morph_rpt_full$R_boot_link$Fixed + (pi^2)/3 )quantile(CI, c(0.025, 0.975))```### Reduced model```{r}# ordinary repeatability# point estimatemorph_rpt_reduced$R[2, "Population"] / (morph_rpt_reduced$R[2, "Population"] + morph_rpt_reduced$R[2, "Residual"])# 95% CI using bootstrappingCI <- morph_rpt_reduced$R_boot_link$Population / (morph_rpt_reduced$R_boot_link$Population + morph_rpt_reduced$R_boot_link$Residual)quantile(CI, c(0.025, 0.975))# Instead of using `Residual`, we could use pi^2/3, which is theoretical distribution-specific variance for family = binomial(logit)# point estimatemorph_rpt_reduced$R[2, "Population"] / (morph_rpt_reduced$R[2, "Population"] + (pi^2)/3 )# 95% CI using bootstrappingCI <- morph_rpt_reduced$R_boot_link$Population / (morph_rpt_reduced$R_boot_link$Population + (pi^2)/3 )quantile(CI, c(0.025, 0.975))```:::# The number of eggsThe number of eggs laid by each female is Poisson-distributed. ## Data generation```{r}# Setting the seedset.seed(7777)# Subset the design matrix (only females lay eggs)datf <- dat1[dat1$Sex =="Female", ]# Simulate random effects for PopulationsPopulation_intercept <-rnorm(n_populations, 0, sqrt(0.4))# Simulate random effect for containers (random intercept)Container_intercept <-rnorm(n_containers, 0, sqrt(0.05))# generation of response values on latent scale (!) based on fixed effects,# random effects and residual errorsEggL <-0.1+ (0.4) * datf$Moisture +# fixed effect of Moisture Population_intercept[datf$Population] +# random intercept: Population Container_intercept[datf$Container] +# random intercept: Containerrnorm(n_individuals/2, 0, sqrt(1)) # data generation (on data scale!) based on Poisson distributiondatf$Egg <-rpois(length(EggL), exp(EggL))```## Model fittingOne note here is that we have to model overdispersion by fitting *observation-level random effect*s (OLREs) for `Individual` in addition to the random intercepts for `Container` and `Population`. See the following papers for overdispersion and OLREs> Harrison, X. A. (2014). Using observation-level random effects to model overdispersion in count data in ecology and evolution. PeerJ, 2, e616.> Nakagawa, S., & Schielzeth, H. (2010). Repeatability for Gaussian and non‐Gaussian data: a practical guide for biologists. Biological Reviews, 85(4), 935-956.::: panel-tabset### Full modelWe use `glmer` function in `lme4` to fix a 'full' model, which completely match with the data generating process. The model includes a fixed effect of `Moisture`, and random intercepts for `Container` and `Population` along with random intercepts for `Individual`, which model overdispersion. ```{r}# fitting a mixed model# Note ``egg_full <-glmer(Egg ~1+ Moisture + (1|Container) + (1|Population) + (1|Individual), data = datf, family = poisson)summary(egg_full)```### Reduced modelWe can also fit a reduced model, which fit our target clustering variable, `Population` as well as `Individual` to account for overdispersion.```{r}egg_reduced <-glmer(Egg ~1+ (1|Population) + (1|Individual), data = datf, family = poisson)summary(egg_reduced)```:::## Variance decompositionWe now want to get variance for each fixed and random effect, including residuals. We can use just `rptR` as we only have one fixed effect so fixed effects do not need to be decomposed further. ::: panel-tabset### Full model```{r}#| eval: false# the same model as `egg_full` yet we do not need to fit `Individual` as OLREsegg_rpt_full <-rpt(Egg ~1+ Moisture + (1| Container) + (1| Population), grname =c("Container", "Population", "Fixed", "Residual"), data = datf, datatype ="Poisson",nboot =1000, npermut =0, ratio =FALSE)# savingsaveRDS(egg_rpt_full, here("Rdata", "egg_rpt_full.rds"))``````{r}# reading in Rdataegg_rpt_full <-readRDS(here("Rdata", "egg_rpt_full.rds"))egg_rpt_full```### Reduced model```{r}#| eval: false# the same model as `size_full`egg_rpt_reduced <-rpt(Egg ~1+ (1|Population),grname =c("Population", "Residual"), data = datf, datatype ="Poisson",nboot =1000, npermut =0, ratio =FALSE)# savingsaveRDS(egg_rpt_reduced, here("Rdata", "egg_rpt_reduced.rds"))``````{r}# reading in Rdataegg_rpt_reduced <-readRDS(here("Rdata", "egg_rpt_reduced.rds"))egg_rpt_reduced```:::## Repeatability calculationOur target clustering variable is `Population`, and we want to estimate repeatability (R) as the ratio of the variance of the target clustering variable (Population) to the total variance. ::: panel-tabset### Full modelHere we have two kinds of effects: 1) effects which we want to remove (adjust) from repeatability calculation (`Container`) and 2) effects which we want to retain (`Moisture`and `Population`; Note `Individual` would be included in `Residual`).```{r}# this is mixture of Rv1 and Rv3# point estimate(egg_rpt_full$R[2, "Population"] + egg_rpt_full$R[2, "Fixed"])/ (egg_rpt_full$R[2, "Population"] + egg_rpt_full$R[2, "Fixed"] + egg_rpt_full$R[2, "Residual"])# 95% CI using bootstrappingCI <- (egg_rpt_full$R_boot_link$Population + egg_rpt_full$R_boot_link$Fixed) / (egg_rpt_full$R_boot_link$Population + egg_rpt_full$R_boot_link$Fixed + egg_rpt_full$R_boot_link$Residual)quantile(CI, c(0.025, 0.975))```### Reduced model```{r}# ordinary repeatability# point estimateegg_rpt_reduced$R[2, "Population"] / (egg_rpt_reduced$R[2, "Population"] + egg_rpt_reduced$R[2, "Residual"])# 95% CI using bootstrappingCI <- egg_rpt_reduced$R_boot_link$Population / (egg_rpt_reduced$R_boot_link$Population + egg_rpt_reduced$R_boot_link$Residual)quantile(CI, c(0.025, 0.975))```:::# Bonus 1: Conditional repeatability (body length)The formula for conditional repeatability is (see the paper for more details):$$R_{conditional} = \frac{\sigma_{u_{0}}^2 + \sigma_{u_{1}}^2 x_{1^*} + 2\rho\sigma_{u_{0}}\sigma_{u_{1}} x_{1^*}}{\sigma_{u_{0}}^2 + \sigma_{u_{1}}^2 x_{1^*} + 2\rho\sigma_{u_{0}}\sigma_{u_{1}} x_{1^*} + \sigma^2_\epsilon}$$Here we will use the model we used earlier `size_full` (the Body length model). Also, for simplicity, we will focus on `Population` and `Residual`, ignoring all the fixed effects and container (i.e, we calculate 'adjusted' repeatability). In the above formula, the variance for Population is $\sigma_{u_{0}}^2$ while the variance for Residual is $\sigma^2_\epsilon$. The variance for `Moisture` is $\sigma_{u_{1}}^2$ and the covariance between `Population` and `Moisture` is $\sigma^2_\epsilon$.This is the model output```{r}summary(size_full)```We have `Moisture` with 0, 1 and 2 and we can get 3 repeatability estimates using the formula above.```{r}sigma2_pop_int <-VarCorr(size_full)$Population[1, 1] # variance for Population interceptsigma2_pop_slope <-VarCorr(size_full)$Population[2, 2] # variance for Population slope sigma_pop_cov <-VarCorr(size_full)$Population[1, 2] # covariance between Population intercept and slopesigma2_residual <-attr(VarCorr(size_full), "sc")^2# residual variance# repeatability for `Moisture` = 0R_cond_0 <- (sigma2_pop_int + sigma2_pop_slope *0+2* sigma_pop_cov *0) / (sigma2_pop_int + sigma2_pop_slope *0+2* sigma_pop_cov *0+ sigma2_residual)# repeatability for `Moisture` = 1R_cond_1 <- (sigma2_pop_int + sigma2_pop_slope *1+2* sigma_pop_cov *1) / (sigma2_pop_int + sigma2_pop_slope *1+2* sigma_pop_cov *1+ sigma2_residual)# repeatability for `Moisture` = 2R_cond_2 <- (sigma2_pop_int + sigma2_pop_slope *2+2* sigma_pop_cov *2) / (sigma2_pop_int + sigma2_pop_slope *2+2* sigma_pop_cov *2+ sigma2_residual)# printing outR_cond_0R_cond_1R_cond_2```# Bonus 2: Creating the same dataset using squidSim```{r}#| eval: false# installing and loading squidSim - https://github.com/squidgroup/squidSim#devtools::install_github("squidgroup/squidSim")library(squidSim)library(lme4)## First Example### DATA STRUCTURE## Two students, each do 6 populations (12 in total), each with 30 individuals, in each combination of the two sexes and three moisture treatments (giving unique individuals 2160 in total). The unique combination of Population, Sex and Moisture is the Container (72 in total) and the unique combination of ID within a container, Sex and Moisture is the Individual.ds1 <-make_structure("Student(2)/Population(6)/ID(30) + Sex(2)+ Moisture(3) + Population:Sex:Moisture + ID:Sex:Moisture",repeat_obs=1,level_names=list(Student=c("A","B"), Sex=c("Female","Male"),Moisture=0:2 ),int_names =c("Container","Individual") )head(ds1)### DATA GENERATIONfull_sim1 <-simulate_population(seed=22376,data_structure = ds1,response_name="BodyL",parameters =list(intercept=15,Population =list(names =c("Population_intercept","Population_slope"),vcov =c(1.3,0.1),beta=c(1,0) # slopes not included in main additive part ),Container =list(names=c("Container_intercept"),vcov=0.1 ),Sex =list(fixed=TRUE, # tells squidSim it is a factornames =c("Female","Male"),beta =c(0,-3) # Female is baseline, then difference between baseline Female and Male ),Student =list(fixed=TRUE, # tells squidSim it is a factornames =c("A","B"),beta =c(0,0.05) # A is baseline, then difference between baseline A and B ),Moisture =list(covariate=TRUE, # tells squidSim it is continuous (not a grouping factor)names =c("Moisture_cont"),beta =c(0.4) ),interactions =list( names ="Population_slope:Moisture_cont", #random slopesbeta=1 ),residual=list(vcov =1 ) ))dat1 <-get_population_data(full_sim1)size_full <-lmer(BodyL ~1+ Sex + Moisture + Student + (1|Container) + (1+ Moisture|Population),data = dat1)summary(size_full)## EXAMPLE 2 - Activity levelds2 <-make_structure("Population(12)/ID(30) + Sex(2)+ Moisture(3) + Population:Sex:Moisture + ID:Sex:Moisture",repeat_obs=3, ## now with 3 obs per individuallevel_names=list(Sex=c("Female","Male"),Moisture=0:2 ),int_names =c("Container","Individual") )full_sim2 <-simulate_population(seed=4242,data_structure = ds2,response_name="Activity",link="log",parameters =list(intercept=0.1,Population =list(names =c("Population_intercept","Population_slope"),vcov =c(1.3,0.1),beta=c(1,0) # slopes not included in main additive part ),Container =list(vcov=0.1 ),Individual =list(vcov=2 ),Sex =list(fixed=TRUE, # tells squidSim it is a factornames =c("Female","Male"),beta =c(0,1.2) # Female is baseline, then difference between baseline Female and Male ),Moisture =list(covariate=TRUE, # tells squidSim it is continuous (not a grouping factor)names =c("Moisture_cont"),beta =c(0.3) ),interactions =list( names ="Population_slope:Moisture_cont", #random slopesbeta=1 ),residual=list(vcov =0.5 ) ))dat2 <-get_population_data(full_sim2)act_full <-lmer(log(Activity) ~1+ Sex + Moisture + (1|Container) + (1+ Moisture|Population) + (1| Individual), data = dat2)summary(act_full)## example 3 - Male Morphds_m <- ds1[ds1$Sex =="Male", ]full_sim3 <-simulate_population(seed=125,data_structure = ds_m,response_name="Colour",link="logit",family="binomial",parameters =list(intercept=0.8,Population =list(vcov =1.2 ),Container =list(vcov=0.1 ),Moisture =list(covariate=TRUE, # tells squidSim it is continuous (not a grouping factor)beta =c(-0.6) ),residual=list(vcov =0 ) ))datm <-get_population_data(full_sim3)morph_full <-glmer(Colour ~1+ Moisture + (1|Container) + (1|Population),family =binomial(link ="logit"),data = datm)summary(morph_full)## example 4 - Egg numberds_f <- ds1[ds1$Sex =="Female", ]full_sim4 <-simulate_population(seed=125,data_structure = ds_f,response_name="Egg",link="log",family="poisson",parameters =list(intercept=0.1,Population =list(vcov =0.4 ),Container =list(vcov=0.05 ),Moisture =list(covariate=TRUE, # tells squidSim it is continuous (not a grouping factor)beta =c(0.4) ),residual=list(vcov =1 ) ))datf <-get_population_data(full_sim4)egg_full <-glmer(Egg ~1+ Moisture + (1|Container) + (1|Population) + (1|Individual), data = datf, family = poisson)summary(egg_full)```# Software and package versions```{r}#| code-fold: truesessioninfo::session_info() %>% details::details(summary ='Current session info',open = F ) %>%pander()```