Quantitative evidence synthesis: a practical guide on meta-analysis, meta-regression, and publication bias tests for environmental sciences
A step-by-step tutorial along with theoretical explanation
Contributors
Shinichi Nakagawa, Yefeng Yang, Erin Macartney, Rebecca Spake, and Malgorzata Lagisz
Update
Last update January 2023
Preface
This step-by-step tutorial is a supplement to our methodological guideline paper invited by Environmental Evidence - the official journal of the Collaboration for Environmental Evidence (CEE).
Intended audience
Our tutorial is aimed at researchers with an interest in using
meta-analytical techniques from any discipline and level of prior
experience with meta-analysis. While our worked examples use questions
and data from the environmental sciences, our guidelines and R tutorial
contain explanations and point to the relevant statistical theory, so
that the techniques can be applied in other scientific disciplines and
statistical software (e.g., Python
or
Stata
).
We will update this tutorial when necessary. Readers can access the latest version in our GitHub repository
Credit and citation
This tutorial integrates much of the work and prior R scripts from
researchers in Prof. Shinichi Nakagawa’s lab (see full publication
list) and from (Associate) Prof. Wolfgang Viechtbauer’s versatile
R
package metafor
(see the documentation, GitHub page, and package webpage).
We encourage the adoption of Open Science practices in the environmental science to make the field more open, reliable, and transparent. For example, through the sharing of data and code (for more information, see O’Dea et al. 2021).
If our paper and tutorial have helped you, please cite the following paper:
Shinichi Nakagawa, Yefeng Yang, Erin Macartney, Rebecca Spake, and Malgorzata Lagisz. Quantitative synthesis: a practical guide to meta-analysis, meta-regression, and publication bias tests for environmental sciences. Environmental Evidence, 2023.
Contact
If you have any questions, mistakes, or bug to report, please contact corresponding authors:
- Dr. Yefeng Yang
Evolution & Ecology Research Centre, EERC School of Biological, Earth and Environmental Sciences, BEES The University of New South Wales, Sydney, Australia
Email: yefeng.yang1@unsw.edu.au
- Professor Shinichi Nakagawa, PhD, FRSN
Evolution & Ecology Research Centre, EERC School of Biological, Earth and Environmental Sciences, BEES The University of New South Wales, Sydney, Australia
Email: s.nakagawa@unsw.edu.au
Setup R
in your computer
Our tutorial uses R statistical software and existing R packages, which you will first need to download and install.
If you do not have it on your machine, first install R
(download). We recommend also
downloading RStudio
, a popular integrated development
environment for coding with R
, created by a company named
posit (download).
After installing R
, you must install several packages
that contain necessary functions for performing the analyses in this
tutorial. If the packages are archived in CRAN, use
install.packages()
to install them. For example, to install
the metafor
package (a common meta-analysis package), you
can execute install.packages("metafor")
in the console
(bottom left pane of R Studio
). To install packages that
are not on CRAN and archived in Github repositories, execute
devtools::install_github()
. For example, to install
orchaRd
(a meta-analysis visualization package) from Github
repository, execute
devtools::install_github("daniel1noble/orchaRd", force = TRUE)
.
Key package list: metafor
,
clubSandwich
,orchaRd
, MuMIn
,
glmulti
, mice
,
metagear
,tidyverse
, here
,
DT
, readxl
, stringr
,
GoodmanKruskal
,ggplot2
, plotly
,
ggthemr
,
cowplot
,grDevices
,grid
,
gridGraphics
, pander
,
formatR
.
These packages might take a little while to install - you may wish to grab a cup of coffee while you wait!
Load custom functions
We also provide some additional helper functions that are necessary for our worked examples. The most straightforward way to use these custom functions is to source them with:
source(here("custom_func","custom_func.R"))
Alternatively, paste the source code (included in the file
custom_func.R) into the console, and hit “Enter” to let R
“learn” these custom functions.
Estimating an effect size statistic
Meta-analyses express the outcome of multiple studies on a common scale, through the calculation of an ‘effect size’(zj in the main text), representing the magnitude of a difference or the strength of a relationship. The effect size serves as the response variable or dependent variable in a meta-analytical model. Several metrics of effect size are available, and the choice of metric should depend on the scientific question under investigation. We discuss three categories of effect size metrics:
- Single-group effects
A statistical summary from one group; e.g., proportion, mean, log standard deviation (lnSD), log coefficient of variation (lnCVR).
- Comparative effects
An effect size that quantifies the magnitude of a difference in statistics between two groups. These include, for example, the standardised mean difference (SMD; well known estimators: Hedges’ g or Cohen’s d), log response ratio (lnRR; aka ratio of means), mean difference (MD), risk (proportion) difference (RD), log odds ratio (lnOR), log variability ratio (lnVR), log coefficient of variation ratio (lnCV).
- Association statistics
Quantify the strength of association between two variables; e.g., Fisher’s z-transformation of correlation correlation coefficient, r (Zr).
The definitions, formulas and explanations of commonly used effect sizes (and their sampling variances, νj) can be found in Table 3 in the main text.
For a chosen effect size, it’s point estimate (zj) and sampling variance (νj) can be calculated using function
escalc()
in metafor
package.
Let’s load the dataset of our first worked example from [1], who meta-analysed the intraspecific change in seven morpho‐ecophysiological leaf traits along global elevational gradients.
dat_Midolo_2019 <- read.csv(here("data","Midolo_2019_Global Change Biology.csv"))
For simplicity, we will only keep the columns that are necessary to demonstrate the meta-analytic techniques of interest in this tutorial.
Table S1
The variables in the first worked example ([1]).
Table S1 includes the coded variables that were extracted from published papers.
You can find more details about the variables in the supplementary information of the paper by [1], Tables S1, if interested. Here we classify the variables into three categories types:
- Bibliographic variables
Variables that distinguish between articles, such as the title
(study_name
), or publication year.
- Descriptive statistics
Variables that are used to compute effect sizes, such as mean
(treatment
and control
), standard deviation
(sd_treatment
and sd_control
) and sample size
(n_treatment
and n_control
).
- Study-related characteristics
Study-level variables that can contribute to random variability among
the effect sizes, such as study identity (Study_ID
), or
systematic variability between the effect sizes, such as treatment
intensity, or study location characteristics such as
(elevation
); which may be of interest as a moderator
variable.
After having an (primary) understanding of the extracted/coded
variables, we can use escalc()
to compute effect size
(using lnRR as an example) with.
Note that the argument measure
is used to specify the
effect size metric that we will calculate. For example, by specifying
measure = "SMD"
, the function will compute the point
estimate of SMD (specifically, Hedges’ g) and its sampling variance.
Here are the most commonly used effect sizes and underappreciated
effect sizes. For other types, you can look at the help information of
escalc
by typing ?escalc
in the console of
R
.
measure = "ROM"
: lnRR - quantifies the difference between the means of two groupsmeasure = "SMD"
: SMD - quantifies the differences between the means of two groupsmeasure = "ZCOR"
: Zr - quantifies the strength of association between two variablesmeasure = "VR"
: lnVR - quantifies the differences in variances between two groupsmeasure = "CVR"
: lnCVR - quantifies the differences in variances between two groups, accounting for the mean-variance relationship
Note that lnVR and lnCVR are variation/dispersion-based effect sizes, which have practical implications for environmental studies. For example, environmental stressors such as pesticides and eutrophication are likely to increase variability in biological systems because stress accentuates individual differences in environmental responses.
lnRR <- escalc(measure = "ROM", # "ROM" means ratio of means; lnRR is specified to be calculated (alternative effect sizes: "SMD" – SMD, "CVR" – lnCVR, "VR" – lnVR; see below); m1i = treatment, # mean value of of group 1 (e.g., environmental stressor); in our worked example, m1i denotes the mean value of a trait measured at the higher elevation level; m2i = control, # mean value of group 2 (e.g., control); in our worked example, m2i denotes the mean of the same trait measured at the lower elevation level; sd1i = sd_treatment, # standard deviation of mean of group 1 (e.g., environmental stressor) sd2i = sd_control, # standard deviation of group 2 (e.g., control) n1i = n_treatment, # sample size of group 1 (e.g., environmental stressor) n2i = n_control, # sample size of group 2 (e.g., control) data = dat_Midolo_2019, # dataset containing the above information (here is the dataset of our working example) )
Let’s explore the effect size estimates and their sampling variances.
Table S2
The point estimate of effect size (lnRR) and their sampling variance (lnRRV) for each included study and comparison (higher elevation vs. lower elevation).
Choosing a meta-analytic model
After computing the effect sizes and sampling variance, we can address the first aim of a meta-analysis - estimating overall effect.
Two traditional meta-analytic models can be easily fitted via
function rma()
in metafor
. These are are
rarely appropriate, and we usually need a more complex model, called a
multilevel meta-analytic model. However, we fit the traditional models
here for comparison.
- Fixed-effect model (‘common-effect’ or ‘equal-effect’ model; Equation 1)
zj=β0+mj,(1)mj∼N(0,νj) All notations can be found in the main text. The corresponding
R
code is:
mod_FE_lnRR <- rma(yi = lnRR, # calculated/estimated effect size are supplied; the outputs of escalc() function; vi = lnRRV, # calculated/estimated sampling variance of lnRR are supplied; method = "EE", # fixed-effect model is specified; data = dat2_Midolo_2019 # our dataset
Below is the output of our fitted fixed-effect model. Model Results, which address the first aim of an environmental meta-analysis: to estimate an overall mean effect size, including the magnitude (β0), uncertainty (standard error SE[β0]), significance test, and confidence intervals (CIs). We will teach you how to properly interpret all the elements of the above printed model results when we get to the next session - the multilevel meta-analytic model.
Equal-Effects Model (k = 1294)
logLik deviance AIC BIC AICc
-18108.6895 41267.0300 36219.3790 36224.5445 36219.3821
I^2 (total heterogeneity / total variability): 96.87%
H^2 (total variability / sampling variability): 31.92
Test for Heterogeneity:
Q(df = 1293) = 41267.0300, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
-0.0144 0.0004 -34.0926 <.0001 -0.0152 -0.0135 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- random-effects model (Equation 2)
zj=β0+μj+mj,(2)μj∼N(0,τ2),mj∼N(0,νj) All notations can be found in the main text. Equation 2 can be
fitted via rma
with code:
Below are the outputs of our fitted random-effects model. Again, we will elaborate on the interpretation in later section.
Equal-Effects Model (k = 1294)
logLik deviance AIC BIC AICc
-18108.6895 41267.0300 36219.3790 36224.5445 36219.3821
I^2 (total heterogeneity / total variability): 96.87%
H^2 (total variability / sampling variability): 31.92
Test for Heterogeneity:
Q(df = 1293) = 41267.0300, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
-0.0144 0.0004 -34.0926 <.0001 -0.0152 -0.0135 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
It should be noted by fitting the above two traditional models, we assume that the number of independent effect sizes k = 1294 effect sizes. This is wrong, as some studies contributed more than one effect size, which are not independent. We will elaborate on this in the next sections!
- Technical Recommendation
Setting method = "REML"
fits a random-effects model
using the restricted maximum likelihood estimator to calculate the
maximum likelihood of heterogeneity and corresponding model coefficients
(e.g., β0). We can also fit a
random-effects model with the above syntax but setting the
method
argument to different estimators for estimating
variance components: method = "PM"
: Paule-Mandel method;
method = "ML"
: maximum likelihood estimator;
method="DL"
- DerSimonian-Laird estimator;
method = "HE"
: Hedges estimator. Generally, we recommend
REML as it can produce (approximately) unbiased
estimates of variance components (e.g., τ2) according to various simulation
works. But note that the Paule-Mandel method has superior properties for
common effect size measures, such as SMD, in the context of
random-effects model. Paule-Mandel method is recommended when fitting
traditional random-effects models. However, the Paule-Mandel method does
not generalize straightforwardly to multilevel models. Put differently,
Paule-Mandel method is not feasible in rma.mv
function.
As we explain in the main text, a common statistical issue confronted by environmental meta-analyst is that of non-independence among data points (i.e., effect sizes). This is because each study (article) often contributes more than one effect sizes. To handle the non-independent effect sizes, the simplest meta-analytic model for environmental sciences is the multilevel model:
zi=β0+μj[i]+ei+mi,(3)μj[i]∼N(0,τ2),ei∼N(0,σ2),mi∼N(0,νi) The dataset of [1] (Table S1) has N = 71 primary studies with k = 1294 effect sizes. k/N = 18 indicating statistical dependence. This is because, on average, 18 effect sizes are contributed by each primary study. See Figure 2 in the main text for more situations of statistical non-independence in environmental meta-analysis. The Equation 3 uses an additional random-effects term, within-study effect ei (effect-size level), to account for non-independence due to multiple effect sizes contributed by studies (i.e., clustering).
We need to handle non-independence from the very beginning of our
analysis. When preparing our data file (e.g., .CSV file), we need to
structure our data in a way that permits the incorporation of
non-independence among effect sizes. To do so, we need to code a unique
identifier for each primary study (Study_ID
: S1,
S2, S3, …) and a unique identifier for each effect
size (ES_ID
: ES1, ES2, ES3,
…).
With these variables coded, we can use function rma.mv
(rather than rma
) to fit the multilevel meta-analytic model
(Equation 3). Supplying Study_ID
and ES_ID
to
the argument random
, we can specify the two random-effects
terms in Equation 3, including: the study level μj[i] (between-study effect) and the
effect size ei (within-study
effect).
We use a formula to properly define the random-effects structure
specified in random
: starts with ~ 1
, followed
by a |
, then a identifier denoting random effects (e.g.,
Study_ID
or ES_ID
). The complete syntax to fit
Equation 3 is:
mod_ML_lnRR <- rma.mv(yi = lnRR, V = lnRRV, random = list(~1 | Study_ID, # allows true effect sizes to vary among different primary studies - account for the between-study effect and quantify between-study heterogeneity; ~1 | ES_ID), # allows true effect sizes to vary within primary studies - account for the with-study effect and quantify with-study heterogeneity; method = "REML", # REML is assigned as the estimator for variance components as suggested; data = dat2_Midolo_2019 # our dataset )
Alternatively, you can use Study_ID/ES_ID
to specify
random
argument, which is a more straightforward way of
showing nesting structure:
mod_ML_lnRR <- rma.mv(yi = lnRR, V = lnRRV, random = ~1 | Study_ID/ES_ID, # alternative syntax to specify random structure; method = "REML", # REML is assigned as the estimator for variance components as suggested; data = dat2_Midolo_2019 # our dataset )
The classic results of the fitted multilevel model (Equation 3) look like:
Multivariate Meta-Analysis Model (k = 1294; method: REML)
logLik Deviance AIC BIC AICc
-138.1757 276.3514 282.3514 297.8456 282.3700
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0191 0.1381 71 no Study_ID
sigma^2.2 0.0515 0.2269 1294 no ES_ID
Test for Heterogeneity:
Q(df = 1293) = 41267.0300, p-val < .0001
Model Results:
estimate se zval pval ci.lb ci.ub
0.0297 0.0194 1.5319 0.1255 -0.0083 0.0678
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let’s go through the results in order of appearance.
- Multivariate Meta-Analysis Model details
We are first reminded of the model we have fitted:
k
= the number of effect sizes (k) fed to the
multilevel model.
method: REML
= the REML method was specified as
estimation procedure for model fitting to obtain model estimates (e.g.,
variance components, model coefficients).
- Fit statistics
The next line contains several fit statistics, which can be used for model comparison), including:
logLik
= restricted log-likelihood of the fitted
model,
Deviance
= goodness-of-fit statistic of the fitted
model,
AIC
= Akaike information criterion score of the fitted
model,
BIC
= Bayesian information criterion, and
AICc
= AIC corrected for small sample sizes.
- Variance Components
sigma^2.1
= between-study variance τ2. In this example, this is
conceptually equivalent to between-study heterogeneity variance τ2 in a random-effects model
(formulated as Equation 2).
sigma^2.2
= within-study variance σ2
estim
= the estimated amount of corresponding variance
components
sqrt
= standard deviation of variance components
nlvls
= the number of levels for each random effect
factor
= the name of the variables we supply to argument
random
to specify the random effects.
- Test for Heterogeneity
Results of the heterogeneity test, which tests whether there are effect size differences in the data (based on Cochran’s Q-test, which is used to test the null hypothesis that all experimental studies have the same/equal effect).
p-val
< 0.05 = effect sizes derived from the
environmental studies are heterogeneous. It is also good to know how
much of the heterogeneity is due to differences within studies, and how
much is due to between-study differences.
- Model Results
Results of model coefficients and corresponding model inference.
estimate
= the estimate of pooled/average/overall
effect; also as known as grand mean or meta-analytic effect size (β0 in Equation 3.
se
= the standard error of the estimate; here, it is
(SE[β0].
zval
= the value of test statistic (in our case:
z-value; see Technical recommendation for our
recommended practices).
ci.lb
= lower boundary of (95% by default) confidence
intervals (CIs).
ci.Ub
= upper boundary of CIs.
The argument test
in functions rma.mv
and
rma
is used to specify the methods to calculate test
statistics and perform significance tests of model coefficient like
β0 (confidence intervals and
p-value). By default, rma.mv
and
rma
set test = "z"
, which uses a test that
assumes a normal distribution. However, when meta-analysing small number
of studies, setting test = "z"
will lead to a nominally
high Type I error rate, leading to high false positive results. To
achieve a nominal performance of tests, it is highly recommended to set
test = "t"
, which uses a t-distribution with
k−p degrees of freedom to test model coefficients
(e.g., β0) and CIs (p
= the number of model coefficients). There are also other possible
adjustments, for example, adjusting the degrees of freedom of the
t-distribution, which can be implemented by setting
dfs = "contain"
.
Therefore, the a multilevel model with improved model inference methods can be fitted with:
mod_ML_lnRR <- rma.mv(yi = lnRR, V = lnRRV, random = list(~1 | Study_ID, # allows true effect sizes to vary among different primary studies - account for the between-study effect and quantify between-study heterogeneity; ~1 | ES_ID), # allows true effect sizes to vary within primary studies - account for the with-study effect and quantify with-study heterogeneity; method = "REML", # REML is assigned as the estimator for variance components as suggested; test = "t", # t-distribution is specified for the tests of model coefficients and CIs; dfs = "contain", # the methods to adjust the (denominator) degrees of freedom; data = dat2_Midolo_2019 # our dataset )
Corresponding results look:
As you can see, under Model Results,
zval
turns to tval
. Note that the value has
not changed because this worked example has a large number of primary
studies (N = 71) that a z-distribution (standard normal
distribution) is approximately the same as a t-distribution. When the
number of primary studies (N) is low, setting
test = "t"
and dfs = "contain"
is highly
recommended.
As we explain in the main text, failing to deal with statistical non-independence (details see Figure 2) will inflate the Type 1 error, leading to an underestimated standard error of β0 and a spurious p-value of β0 (our first example shows this problem; see below). Note that the estimate of the magnitude of β0 is not necessarily biased. However, the bias can be large and even cause a change in effect size sign/direction (as is the case with our second worked example; see below section).
Below, we compare the results of traditional models (fixed and random effects models) and the more appropriate multilevel meta-analytic model. We show how ignoring statistical non-independence using the traditional models distort the the meta-analytic evidence. Let’s look at Table S3 showing results of traditional models and the multilevel model.
Table S3
Comparison of the random-effectss and multilevel meta-analytic
models.
- Biased results by ignoring statistical non-independence
From Table S3, we can see that conclusions based on our proposed multilevel model conflict with those reached by random-effects model; p-value and 95% CIs demonstrate that the multilevel model shows statistically non-significant overall effect (β0), while random-effectss model shows statistically significant overall effect (β0)。
This indicates that using the traditional model to fit non-independent effect sizes underestimates the standard error of model coefficient (SE[β0]) and distorts corresponding statistical inference (e.g., by inflating the p-value): SE[β0] = 0.019 in the multilevel model vs. SE[β0] = 0.008 in the random-effects model; p-value = 0.1255 in the multilevel model vs. p-value = 2^{-4} in the random-effects model. The width of 95% CIs in the multilevel model are wider than those in the random-effects model: = [-0.008 to 0.068] in the multilevel model vs. [0.014 to 0.068] in the random-effects model. Therefore, if using random-effects model fitting this data ([1]), we might have wrongly concluded that increasing elevation has, on average, no effect on intraspecific leaf traits.
The multilevel model accounts for statistical non-independence due to multiple effect sizes derived from the same study. We can quantify the degree of dependence using an index called the intraclass correlation coefficient:
ICC=τ2τ+σ2
The ICC index can reflect the intraclass correlation of the true effects within the same study. In the above fitted model, the ICC can be computed with:
mod_ML_lnRR$sigma2[1] / sum(mod_ML_lnRR$sigma2)
[1] 0.2703379
An ICC value of 0.2703 indicates that the underlying true effects (quantified as lnRR in this case) are weakly correlated within studies. However, the traditional models ignores this dependence and regards all true effects within the same study as independent (which would give ICC = 0). Therefore, the meta-analytic conclusions have changed qualitatively (turning from statistically significant into non-significant - inflated Type I error rates).
- Mis-estimation of heterogeneity statistics
From Table S3, we can also see that random-effects model leads to neglected results with respect to variance and heterogeneity between effect sizes. This is because random-effects model is likely to exaggerate between-study variance (τ2). The value of τ2 in random-effects model (0.065) is three times that in multilevel model (0.019). The random-effects model has wrongly distributed within-study variance (σ2) to between-study variance (τ2). Therefore, fitting this dataset using random-effects model will result in a wrong conclusion that there is a high amount of between-study heterogeneity. This overlooks the fact that there is a large variance and heterogeneity within study (σ2 = 0.052 and I2effect = 72.73; see the calculation of I2effect in the section Quantifying & explaining heterogeneity).
As explained in the main text, when multiple effect sizes are contributed by a single article, there will be two broad types of non-independence (see Figure 2 for detailed cases):
non-independence among effect sizes
non-independence among sampling errors
The multilevel model we proposed (as in Equation 3) only deals with cases of non-independence among effect sizes (zi). For non-independence among sampling variances (νi), for example, due to shared control statistics, we can construct a variance-covariance matrix to explicitly capture the non-zero covariance (Cov[ν1[1],ν1[2]]=ρ12√ν1[1]ν1[2]) that arises from correlation between sampling errors (νi) within the same primary studies:
\boldsymbol{M} =
\begin{bmatrix}
\nu_{1[1]} & \rho_{12}\sqrt{\nu_{1[1]}\nu_{1[2]}} & 0 \\
\rho_{21}\sqrt{\nu_{1[2]}\nu_{1[1]}} & \nu_{1[2]} & 0 \\
0 & 0 & \nu_{2[3]}
\end{bmatrix}, (4)
Constructing a VCV matrix is almost
impossible because the within-study correlation (e.g., \rho_{12}) is rarely available from
primary environmental studies, although exact covariances can be
computed from some comparative effect statistics (see Table
S2). But we can impute a VCV matrix by
assuming a constant \rho across
different environmental studies (\rho_{ik}=\cdots=\rho_{jh}\equiv\rho).
With this simple assumption, we can impute VCV matrix
via function impute_covariance_matrix()
from the package
clubSandwich
:
VCV <- impute_covariance_matrix(vi = dat2_Midolo_2019$lnRRV, # sampling variances that are correlated with each within the same study; cluster = dat2_Midolo_2019$Study_ID, # study identity - clustering variable; r = 0.5 # assuming that the effect sizes within the same study are correlated with rho = 0.5. )
This can also be done via function vcalc()
in
metafor
:
VCV <- vcalc(vi = lnRRV, # sampling variances that are correlated with each within the same study; cluster = Study_ID, # study identity - clustering variable; obs = ES_ID, # different effect sizes corresponding to the same response/dependent variable data = dat2_Midolo_2019, rho = 0.5 # assuming that the effect sizes within the same study are correlated with rho = 0.5 )
Setting rho = 0.5
means we assume that sampling errors
within the same study are correlated with \rho = 0.5 (see below for
Technical Recommendation). Let’s see what the imputed
VCV matrix looks like. Let’s use the Bansal and Germino
(2010) study as an example:
[,1] [,2] [,3] [,4]
[1,] 0.023 0.009 0.009 0.010
[2,] 0.009 0.013 0.007 0.007
[3,] 0.009 0.007 0.015 0.008
[4,] 0.010 0.007 0.008 0.017
As we can see, the VCV matrix is a symmetric matrix, with the sampling variance (\nu_{[i]}) on the diagonal, and the covariance (Cov[\nu_{i},\nu_{k}]) on the off-diagonal.
Let’s account for both non-independence among effect sizes and sampling errors:
mod_ML_lnRR_var <- rma.mv(yi = lnRR, V = VCV, # use covariance for comparison with that fitted by sampling covariance random = list(~1 | Study_ID, ~1 | ES_ID), method = "REML", test = "t", data = dat2_Midolo_2019 )
We can explore whether the model coefficients and corresponding significance tests change:
Multivariate Meta-Analysis Model (k = 1294; method: REML)
logLik Deviance AIC BIC AICc
-107.3517 214.7034 220.7034 236.1976 220.7220
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0114 0.1066 71 no Study_ID
sigma^2.2 0.0516 0.2272 1294 no ES_ID
Test for Heterogeneity:
Q(df = 1293) = 65913.4258, p-val < .0001
Model Results:
estimate se tval df pval ci.lb ci.ub
0.0268 0.0170 1.5751 1293 0.1155 -0.0066 0.0602
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We see that the overall effect (\beta_0 = 0.03, 95% CIs = [-0.01 to 0.06]) and corresponding significance tests (p-value = 0.1155) do not change after accounting for non-independence among sampling errors. But this does not mean we do not need to account for it in general.
- Technical Recommendation
Certain values of \rho have been
recommended by several published papers, for example,
rho = 0.5
or more conservatively, rho = 0.8
.
We should check the robustness of our study results against different
values of \rho (see
Conducting sensitivity analysis & critical
appraisal for a pipeline conducting such a sensitivity).
Robust variance estimation (RVE) provides a way to include dependent effect sizes in meta-regression, even when the nature of the dependence structure is unknown. RVE methods do not require knowledge of the exact dependence structure between effect size estimates, and instead, RVE approximates the dependence structure. We recommend that RVE is applied within the framework of the multilevel model to make robust inference and to partition variance components to provide more insights into heterogeneity.
It is very straightforward to implement the combination of RVE and
multilevel model. We can supply the fitted multilevel model
(rma.mv
object) to function coef_test()
function from the clubSandwich
package:
mod_ML_RVE <- coef_test(mod_ML_lnRR_var, # fitted multilevel model for which to make robust model inference (an object of class "rma.mv" can be directly supplied to coef_test()); vcov = "CR2", # ‘bias-reduced linearization’ is specified to approximate variance-covariance; cluster = dat2_Midolo_2019$Study_ID # study identity -clusting variable )
RVE computes the robust error and use it for the subsequent testing
of null hypothesis (t-stat
, p-val
) and
confidence intervals of the model coefficients (\beta_{0}):
Coef. Estimate SE t-stat d.f. p-val (Satt) Sig.
1 intrcpt 0.0297 0.0194 1.53 64.7 0.13
Alternatively, we can use function robust()
in
metafor
to implement RVE:
robust(mod_ML_lnRR_var, cluster = dat2_Midolo_2019$Study_ID, clubSandwich = TRUE)
Multivariate Meta-Analysis Model (k = 1294; method: REML)
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0191 0.1381 71 no Study_ID
sigma^2.2 0.0515 0.2269 1294 no ES_ID
Test for Heterogeneity:
Q(df = 1293) = 41267.0300, p-val < .0001
Number of estimates: 1294
Number of clusters: 71
Estimates per cluster: 2-90 (mean: 18.23, median: 10)
Model Results:
estimate se1 tval1 df1 pval1 ci.lb1 ci.ub1
0.0297 0.0194 1.5327 64.72 0.1302 -0.0090 0.0685
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
1) results based on cluster-robust inference (var-cov estimator: CR2,
approx t-test and confidence interval, df: Satterthwaite approx)
We can see the results of RVE from function coef_test()
are same as those from function robust()
.
- Technical Recommendation
We do not yet have a definite recommendation on which method to use to account for non-independence among sampling errors (using the VCV matrix or RVE). This is because no simulation work in the context of multilevel meta-analysis has been done so far.
Quantifying & explaining heterogeneity
The second aim of a meta-analysis is to quantify consistencies (heterogeneity) between studies. In our main text, we recommend two ways of answering this question: using absolute and relative measures of heterogeneity.
- Absolute heterogeneity measure
We can use variance components such as \tau^2, which can be directly extracted from the outputs of the fitted model (see section Results interpretation); \tau^2 in a random-effects model (or \tau^2 + \sigma^2 in a multilevel model) directly reflects the differences underlying the true effects corresponding to each random effect because the square root can be interpreted as the standard deviation of the true effect sizes. Also, an often neglected insight here is that the sum of the \tau^2 and \sigma^2 denotes the total amount of variation or heterogeneity in the true effects. In this worked example ([1]), the variation in the true effects (\tau^2 + \sigma^2) is 0.0706, which is quite large given the magnitude of the overall effect (\beta_0 = 0.0297). Put differently, true effects’ heterogeneity is more than twice the magnitude of true effects (CV = 2.37)
- Relative heterogeneity measure
We can use I^2 statistic to express variance due to differences between studies (not due to sampling variance) in the case of random-effects model (Equation 2):
I^2=\frac{\tau^2} {\tau^2+\overline{\nu}}, (5) \overline{\nu}=\frac{(N_{effect}-1)\sum_{j=1}^{k} 1/\nu_{i}} {(\sum_{j=1}^{k} 1/\nu_{i})^2-\sum_{j=1}^{k} 1/\nu_{i}^2}, (6)
In the case of multilevel model (Equation 3), the formulas can be written as:
I^2_{total}=\frac{\tau^2+\sigma^2} {\tau^2+\sigma^2+\overline{\nu}}, (7) I^2_{total} can be further decomposed into variance due to differences between studies (I^2_{study}) and variance due to differences within studies (I^2_{effect}):
I^2_{study}=\frac{\tau^2} {\tau^2+\sigma^2+\overline{\nu}}, (8)
I^2_{effect}=\frac{\sigma^2} {\tau^2+\sigma^2+\overline{\nu}}, (9)
We have developed a function i2_ml()
(orchaRd
package) to compute Equations 7 to 9:
i2_ml(mod_ML_lnRR_VCV)
I2_Total I2_Study_ID I2_ES_ID
99.63261 17.97605 81.65656
We see there is a small between-study heterogeneity (indicated by
I^2_{study}), while a large
within-study heterogeneity (indicated by I^2_{effect}). If we use a random-effects
model to quantify heterogeneity, we might have concluded that the high
heterogeneity is due to between-study differences (i.e. \tau^2 and I^2; see Table S3).
There is an argument boot
that allows us to compute the
(percentile) CIs of I^2. Note that
calculating CIs of I^2 may take a
long time to run because i2_ml()
uses the (parametric)
bootstraping method.
The third aim of a meta-analysis is to explain the heterogeneity. To answer this question, it is recommended to use meta-regression models rather than subgroup analysis (where one divides the dataset according to one predictor variable [sex: male vs. female] and conducting separate meta-analyses for each subset). In the context of meta-analysis, we often name a predictor variable as a moderator, indicating that the magnitude of overall effect may be modified systematically as a function of the moderator.
Building upon the multilevel model (Equation 3), a meta-regression model can be constructed by adding one moderator variable (AKA predictor, independent/explanatory variable, or fixed factor):
z_{i} = \beta_{0} + \beta_{1}x_{1j[i]} + \mu_{j[i]} + e_{i} + m_{i}, (10)\\ \mu_{j[i]} \sim N(0,\tau^2), e_{i} \sim N(0,\sigma^2), m_{i} \sim N(0,\nu_{i})
Model coefficient \beta_1, denotes the slope of the moderator variable x_{1}, and how it affects the magnitude of the overall effect.
If we are interested in the effects of more than one moderator, we can put the moderator variables into a single meta-regression model, leading to multiple meta-regression (note that post-hoc hypotheses are generally not recommended, so you need to have a clear hypothesis about each variable, rather adopting a data-driven strategy):
z_{i} = \beta_{0} + \sum \beta_{h}x_{h[i]} + \mu_{j[i]} + e_{i} + m_{i}, (11)\\ \mu_{j[i]} \sim N(0,\tau^2), e_{i} \sim N(0,\sigma^2), m_{i} \sim N(0,\nu_{i})
We can fit a (multiple) regression model (as formulated as Equations
10 and 11) using function rma.mv()
. The moderator, x_{1}, can be supplied to the argument
mods
using following formula: starting with a tilde
~
, followed by the name of the moderator (e.g.,
mods = ~ x_1
).
In our worked example, [1] examined
whether intraspecific leaf trait variation (measured as lnRR) in
response to difference in elevation. To answer this question, we can
construct a meta-regression model with difference in elevation as a
moderator, which is coded as elevation column in the dataset
(we log-transformed the value of elevation for each study to achieve
normality assumption and labelled it as elevation_log). Setting
mods = ~ elevation_log
can fit such a meta-regression model
with:
mod_MLMR_lnRR_elevation <- rma.mv(yi = lnRR, V = VCV, mods = ~ elevation_log, # adding elevation as a moderator variable (for continuous variable, it is a good practice to log-transform them to avoid data skewness); random = list(~1 | Study_ID, ~1 | ES_ID), method = "REML", test = "t", data = dat2_Midolo_2019, sparse = TRUE )
The results of the above fitted meta-regression model are very similar with those of a multilevel meta-analytic model (as shown in Results interpretation in Multilevel meta-analytic model):
Multivariate Meta-Analysis Model (k = 1294; method: REML)
logLik Deviance AIC BIC AICc
-103.6834 207.3669 215.3669 236.0227 215.3979
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0098 0.0991 71 no Study_ID
sigma^2.2 0.0515 0.2269 1294 no ES_ID
Test for Residual Heterogeneity:
QE(df = 1292) = 65188.5301, p-val < .0001
Test of Moderators (coefficient 2):
F(df1 = 1, df2 = 1292) = 8.0348, p-val = 0.0047
Model Results:
estimate se tval df pval ci.lb ci.ub
intrcpt -0.1507 0.0647 -2.3290 1292 0.0200 -0.2776 -0.0238 *
elevation_log 0.0290 0.0102 2.8346 1292 0.0047 0.0089 0.0491 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here, we only discuss results that are different from those from the multilevel meta-analytic model.
- Test for Residual Heterogeneity
QE
= test statistic used to test whether there is a
large amount of “residual heterogeneity” among effect sizes. “Residual
heterogeneity” means the amount of heterogeneity that is not explained
by the included moderator (here is elevation).
p-val
< 0.0001 = residual heterogeneity is still
substantial (which is statistically significant larger than that of
sampling variance).
- Test of Moderators (coefficient 2)
This section denotes omnibus test of all model coefficients or joint test of the null hypotheses:
H_0:\beta_0=\beta_1=0
coefficient 2
= 2 coefficients are tested. In our
case, intercept (\beta_0) and slope
(\beta_1) of x_1 (i.e., elevation).
F(df1 = 1, df2 = 1292)
= the omnibus test is based on
F-distribution with m (2) and k (1294) - p
(2) (degrees of freedom with m representing the number of model
coefficients tested and p the total number of model
coefficients). We recommend to use a F-distribution rather than a
chi-square distribution to improve the inference performance. To do so,
set test = "t"
rather than test = "z"
(default
of rma.mv()
).
p-val = 0.0047
= p-value of this test,
indicating that we can reject the null hypothesis that elevation has a
null effect on leaf traits.
- Model Results
intrcpt
= intercept \beta_0 in Equation 10. Note that this
\beta_0 is distinct from \beta_0 (i.e., overall effect) in
Equations 1 - 3. \beta_0 here means
the average effect size (lnRR) with a elevation of 0. To put it in
another way, setting x_1 = 0 can
get intercept \beta_0.
elevation_log
= slope \beta_1 of x_1 (in this case elevation),
denoting the estimated effect of the tested moderator x_1.
With respect to the hypothesis tested in [1], meta-regression confirms that elevation has a positive relationship with intraspecific leaf trait (quantified as lnRR).
As explained in our main text, a categorical moderator also can be
incorporated into a meta-regression by creating ‘dummy’ variables
representing the various levels/subgroups (see main text for an
example). R
can create dummy variables for us
automatically. Therefore, parameterizing a meta-regression with a
categorical moderator is the same as meta-regression with a continuous
moderator, with one exception: we can decide whether to keep or remove
intercept (see details below).
Let’ say we aim to investigate whether the effect of elevation differs depending on leaf trait types. This variable is coded as trait in the dataset, including 7 levels: specific leaf area (SLA), leaf mass per area (LMA), leaf area (LA), nitrogen concentration per unit of area (Narea), nitrogen concentration per unit mass (Nmass), phosphorous concentration per unit mass (Pmass) and carbon isotope composition (d13C).
Traditionally, we would divide the dataset into 7 subgroups and subsequently conduct 7 separate meta-analyses. A more powerful method is to fit a meta-regression with this categorical moderator (trait):
mod_MLMR_lnRR_trait <- rma.mv(yi = lnRR, V = VCV, mods = ~ trait -1, # setting '-1' is a useful strategy to remove intercept and then directly obtain the estimated effect (slope) for each subgroup or for a particular level within the categorical moderator. random = list(~1 | Study_ID, ~1 | ES_ID), method = "REML", test = "t", data = dat2_Midolo_2019 )
The corresponding model output is then:
Multivariate Meta-Analysis Model (k = 1294; method: REML)
logLik Deviance AIC BIC AICc
-71.6727 143.3454 161.3454 207.7860 161.4863
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0112 0.1058 71 no Study_ID
sigma^2.2 0.0486 0.2205 1294 no ES_ID
Test for Residual Heterogeneity:
QE(df = 1287) = 63967.9953, p-val < .0001
Test of Moderators (coefficients 1:7):
F(df1 = 7, df2 = 1287) = 11.4776, p-val < .0001
Model Results:
estimate se tval df pval ci.lb ci.ub
I(trait)dC13 -0.0136 0.0243 -0.5596 1287 0.5758 -0.0613 0.0341
I(trait)LA -0.0672 0.0275 -2.4493 1287 0.0144 -0.1211 -0.0134 *
I(trait)LMA 0.0617 0.0251 2.4566 1287 0.0142 0.0124 0.1109 *
I(trait)Narea 0.0820 0.0243 3.3816 1287 0.0007 0.0344 0.1296 ***
I(trait)Nmass 0.0842 0.0214 3.9409 1287 <.0001 0.0423 0.1261 ***
I(trait)Pmass 0.1633 0.0332 4.9155 1287 <.0001 0.0981 0.2285 ***
I(trait)SLA -0.0258 0.0241 -1.0702 1287 0.2848 -0.0732 0.0215
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now, results under Test of Moderators are joint test
of null hypotheses for slopes of 7 dummy variables (\beta_1 to \beta_7; no intercept \beta_0 because we remove it via
-1
):
H_0:\beta_1=\beta_2=\beta_3=\beta_4=\beta_5=\beta_6=\beta_7=0
p-val < .0001
means the null hypotheses are rejected,
suggesting that the trait types as a whole indeed affect the average
effect of elevation (there is at least one subgroup showing a
statistically significant effect).
The results printed under Model Results
directly give
the estimated effect for each of 7 traits because removing intercept
from model (by '-1'
) can make all dummy variables be
incorporated in the model as moderators. We see that 5 traits
(LA, LMA, Narea, Nmass and
Pmass) have statistically significant effects, while the other
2 traits (dC13 and SLA) do not.
- Technical Recommendation
Setting mods = ~ x
(in this case,
mods = ~ trait
) will keep intercept (\beta_0) in the model:
mod_MLMR_lnRR_trait2 <- rma.mv(yi = lnRR, V = VCV, mods = ~ I(trait), random = list(~1 | Study_ID, ~1 | ES_ID), method = "REML", test = "t", data = dat2_Midolo_2019 )
Multivariate Meta-Analysis Model (k = 1294; method: REML)
logLik Deviance AIC BIC AICc
-71.6727 143.3454 161.3454 207.7860 161.4863
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0112 0.1058 71 no Study_ID
sigma^2.2 0.0486 0.2205 1294 no ES_ID
Test for Residual Heterogeneity:
QE(df = 1287) = 63967.9953, p-val < .0001
Test of Moderators (coefficients 2:7):
F(df1 = 6, df2 = 1287) = 12.9754, p-val < .0001
Model Results:
estimate se tval df pval ci.lb ci.ub
intrcpt -0.0136 0.0243 -0.5596 1287 0.5758 -0.0613 0.0341
I(trait)LA -0.0536 0.0304 -1.7666 1287 0.0775 -0.1132 0.0059 .
I(trait)LMA 0.0753 0.0283 2.6639 1287 0.0078 0.0198 0.1307 **
I(trait)Narea 0.0956 0.0265 3.6063 1287 0.0003 0.0436 0.1477 ***
I(trait)Nmass 0.0978 0.0242 4.0417 1287 <.0001 0.0503 0.1452 ***
I(trait)Pmass 0.1769 0.0354 5.0013 1287 <.0001 0.1075 0.2463 ***
I(trait)SLA -0.0122 0.0285 -0.4285 1287 0.6684 -0.0681 0.0437
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
By default, R
alphabetizes the dummy variables. In our
case, dC13 is set as reference level and associated dummy
variable is taken out from the meta-regression model. Why dC13
rather than other dummy variables? Because the letter “d”
(dC13) comes before other letters, for example, “L”
(LA) and “S” (SLA). Therefore, the model intercept \beta_{0} (intrcpt
)
represents the estimated effect for subgroup of dC13 (\beta_0 = -0.0136, 95CI% = [-0.0613 to
0.0341], p-value = 0.5758). The remaining model coefficients
denote contrasts between the reference level (intrcpt
or
dC13) and the other 6 levels/groups. Therefore, removing the
intercept (not adding -1
) can allow us to test whether
different levels/subgroups differ in terms of average effect. If we aim
to compare the estimated effect of one level to other levels, we can set
this level as reference by function relevel()
or
factor()
prior to model fitting. Alternatively, we can use
function anova()
to obtain the contrasts. Note, if you use
a subgroup analysis rather than the recommended meta-regression to
explain heterogeneity, you have to do it by hand: using Wald-type test
to calculate the test statistics and then performing significance
tests.
The goodness-of-fit index R^2 can be used to quantify the percentage of variance explained by a moderator. This index has implications of the importance of the examined moderator. It also can be used for model section to select a model with the ‘best’ set of moderators. A general and readily implementable form of R^2 is the marginal version:
[2] propose to use a general form of R^2 - marginal R^2, which can be calculated as:
R^2_{marginal}=\frac{f^2} {f^2+\tau^2+\sigma^2}, (12) \\ f^2 = Var(\sum
\beta_{h}x_{h[i]}), (13)
We wrote a function r2_ml()
(in
orchaRd
package), which does the calculation of R^2_{marginal} :
r2_ml(mod_MLMR_lnRR_elevation)
The result given under R2_marginal
shows that elevation
can explain 0.98% of the variation between effect sizes.
R2_marginal R2_conditional
0.009803104 0.168309094
Notes on visualisation and interpretation
For environmental meta-analyses, a forest plot is often used to
visualize the distribution of effect sizes and the 95% CIs for each
study, as well as the overall effects based on model (e.g., beta_0 and its 95% CIs). A classic forest
(Figure S1) can be made by function
forest()
in metafor
package:
forest(mod_ML_lnRR_VCV, # rma.mv object xlab = "Effect size lnRR")
On the bottom of the forest plot (Figure S1) made by
forest()
, there is a four-sided polygon (well known as the
‘diamond’) representing the overall effect based on the fitted model (in
our case, mod_ML_lnRR_VCV - rma.mv
object
containing outputs of a multilevel model). The center of the diamond
denotes the point estimate and the left/right edges correspond to lower
and upper CI boundaries. However, we can not clearly see these different
features because the number of studies is very large for this dataset
(it gets a bit squashed). Recently, some new types of figures that build
upon forest plots have been proposed, including the ‘caterpillar’ plot
and ‘orchard’ plot. See Underappreciated plots.
Figure S1
An example of forest plot showing the effect sizes from each study (and
their 95% CIs) and the overall effect based on model.
Function orchard_plot()
in orchaRd
package
can make a forest-like plot (termed as ‘orchard’ plot) that can
accommodate large number of studies.
A basic orchard plot can be made with:
orchard_plot(mod_multilevel_SMD, mod = "1", xlab = "Standardised mean difference (SMD)", group = "Study_ID", k = TRUE, g = TRUE, data = dat2_Midolo_2019) + scale_x_discrete(labels = c("Overall effect (meta-analytic lnRR)"))
Figure S2
Orchard plot (forest-like plot) showing the effect sizes from each study
(and their 95% CIs), the overall effect and its 95% CIs and 95%
prediction intervals.
We see that an orchard plot is more informative than a classic forest plot. For example, it shows the distribution of the effect sizes, the number of effect sizes (k) and the number of studies (the number in the bracket). An orchard plot not only can visualize the meta-analytic results, but also lets us realize something we are not bale to see from statistical results, such as influential data points and outliers that could threaten the robustness of our results. Further, an orchard plot not only show 95% CIs (thick whiskers in Figure S2) but also 95% prediction intervals (PIs; thin whiskers), through which we can visually check the heterogeneity among effect sizes. Because CIs only include the standard error of the overall effect \beta_0 (SE[\beta_0]), which PIs include variance of random effects:
\text{95%CI} = \beta_{0} \pm t_{df[\alpha=0.05]} SE[\beta_{0}^2], (14)
\text{95%PI} = \beta_{0} \pm t_{df[\alpha=0.05]} \sqrt{\tau^2+\sigma^2+
SE[\beta_{0}^2]}, (15)
If you want to know the detailed value of PIs, you can use
predict
:
pred se ci.lb ci.ub pi.lb pi.ub
0.0268 0.0170 -0.0066 0.0602 -0.4667 0.5203
- Visualize meta-regression with a categorical moderator
A fabulous function of orchard_plot()
is that it also
can nicely show results based on meta-regression model (Figure
S3). The results based on meta-regression with trait
as a moderator can be visualized with:
orchard_plot(mod_MLMR_lnRR_trait, mod = "trait", xlab = "Effect size lnRR", group = "Study_ID", k = TRUE, g = TRUE, trunk.size = 1.5, data = dat2_Midolo_2019) + scale_x_discrete(labels = c("SLA","Pmass","Nmass","Narea","LMA","LA","dC13"))
Figure S3
Orchard plot (forest-like plot) showing the effect sizes from each study
and the overall effect for each subgroup/levels of a given categorical
moderator.
- Visualize meta-regression with a continuous moderator
For a meta-regression with a continuous moderator, we can use
bubble_plot()
to visualize the results. Let’s visualize the
meta-regression with elevation as the continuous moderator:
bubble_plot(mod_MLMR_lnRR_elevation, mod = "elevation_log", xlab = "Elevation (log-transformed)", ylab = "Effect size lnRR", group = "Study_ID",k = TRUE, g = TRUE, data = dat2_Midolo_2019, legend.pos = "top.left") + theme(axis.text.x = element_text(size = 12, colour = "black"), axis.text.y = element_text(size = 12, colour = "black"), axis.title.x = element_text(size = 12, colour = "black"), plot.title = element_text(size = 12, colour = "black"))
Figure S4
Bubble plot showing the results of a meta-regression model: relationship
between a continuous moderator (in our case, elevation) and
effect size magnitude (lnRR).
Checking for publication bias and robustness
- Detecting small study effect
The most well-known form of publication bias is the small study effect, where effect size values from a “small” studies, with low replication and therefore large uncertainty and low precision, show different, often larger, treatment effects than large studies. A straightforward way to detect small study effect is to add the uncertainty of effect size as a moderator, such that the relationship between effect size and its uncertainty can be quantified. We propose to formulate Egger’s regression (which is a classic method to detect the symmetry of a funnel plot) in the framework multilevel model to detect the small-study effect for dependent effect sizes:
z_{i} = \beta_{0} + \beta_1\sqrt{\frac {1} {\tilde{n_i}}} + \mu_{j[i]} + e_{i} + m_{i}, (16) z_{i} = \beta_{0} + \beta_1(\frac {1} {\tilde{n_i}}) + \mu_{j[i]} + e_{i} + m_{i}, (17)
Sampling error \sqrt{\nu_i} is a typical measure of effect size uncertainty z_{i}. However, for some types of effect size, for example, SMD, z_{i} has a intrinsic relationship with its \nu_i (see Table S2). Therefore, \nu_i is not a valid moderator for detecting a small-study effect. In Equation 16, we use an adapted sampling error based on effective sample size \tilde{n} as the moderator. Let’s calculate \tilde{n} = \frac {n_{iC}n_{iT}} {n_{iC}+n_{iT}} for SMD in our example (see Table S2 for formulas for other effect sizes):
ess.var_cal <- function(dat){1/dat$n_control + 1/dat$n_treatment} # write a help function to calculate adapted sampling variance based on effective size based - tilde n dat2_Midolo_2019$ess.var <- ess.var_cal(dat2_Midolo_2019) # calculate tilde N dat2_Midolo_2019$ess.se <- sqrt(dat2_Midolo_2019$ess.var) # calculate adapted sampling error based on effective size - tilde square root n
Then the Equation 16 can be fitted with:
mod_MLMR_lnRR_ess.se <- rma.mv(yi = lnRR, V = VCV, mods = ~ ess.se, # add adjusted based sampling error - tilde square root n as a moderator to test small study effect. random = list(~1 | Study_ID, ~1 | ES_ID), method = "REML", test = "t", data = dat2_Midolo_2019 )
The outputs of the above fitted model are exactly the same as those of a meta-regression model:
Multivariate Meta-Analysis Model (k = 1294; method: REML)
logLik Deviance AIC BIC AICc
-106.2100 212.4201 220.4201 241.0758 220.4511
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0112 0.1060 71 no Study_ID
sigma^2.2 0.0516 0.2272 1294 no ES_ID
Test for Residual Heterogeneity:
QE(df = 1292) = 65891.3832, p-val < .0001
Test of Moderators (coefficient 2):
F(df1 = 1, df2 = 1292) = 1.6336, p-val = 0.2014
Model Results:
estimate se tval df pval ci.lb ci.ub
intrcpt -0.0353 0.0515 -0.6865 1292 0.4925 -0.1364 0.0657
ess.se 0.1231 0.0963 1.2781 1292 0.2014 -0.0658 0.3120
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
If you followed the early sections of this tutorial, we assume you
can interpret these results by yourself. Briefly, if we look at
ess.se
given under Model Results:
p-value and 95% CIs suggest that there is no statistical
relationship between the effect size its error, meaning that no small
study effect exits. The bubble plot (Figure S5) also
indicates that there is no visual correlation between effect size and
its error - effect size symmetrically distribute on the funnel plot
(Figure S6).
Figure S5
Bubble plot showing the relationship between the effect size magnitude
and its adjusted sampling error (effective sample size based). Small
studies (low precision) do not report large effect sizes.
Figure S6
Visual inspection of the typical funnel plot to identify the small study
effect.
During the peer review stage, one referee suggested using the ‘effective’ sample size rather than inverse standard error. We do agree with this point. Therefore, we added a new funnel plot using effective sample sizes as y-axis in this tutorial (Figure S6.2). It is important to note that funnel plots using either inverse standard error or ‘effective’ sample size are not reliable for detecting publication bias. This is because they are only visual checks. To properly detect publication bias, a regression-based statistical test needs to be used, as shown above.
Figure S6.2
Visual inspection of the funnel plot based on effective sample size to
identify the small study effect.
- Detecting decline effect
The decline effect, also known as time-lag bias, is another prominent form of publication bias, where effect sizes tend to get closer to zero over time. Testing for a decline effect is important because the temporal changes in evidence of a given field poses a threat to environmental policy-making, management, and practices. Decline effects can be tested by a meta-regression with publication year (centered to ease interpretation: c(year_{j[i]})) as a moderator:
z_{i} = \beta_{0} + \beta_1c(year_{j[i]}) + \mu_{j[i]} + e_{i} + m_{i}, (18) Equation 18 can be fitted with code:
mod_MLMR_lnRR_Year.c <- rma.mv(yi = lnRR, V = VCV, mods = ~ Year.c, # add centered year as a moderator to detect decline effect. random = list(~1 | Study_ID, ~1 | ES_ID), method = "REML", test = "t", data = dat2_Midolo_2019, sparse = TRUE )
As you see in the following results,regression slope is
Year.c
= 0.002 (95% CIs = [-0.0022 to 0.0062]), which is
very small and not statistically different from zero
(t_value
= 0.9135 and p-val
= 0.3611),
suggesting studies with statistically significant results do not publish
earlier than these with statistically non-significant results results,
i.e. no time-lag bias (Figure S7).
Multivariate Meta-Analysis Model (k = 1294; method: REML)
logLik Deviance AIC BIC AICc
-106.5039 213.0079 221.0079 241.6637 221.0390
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0114 0.1067 71 no Study_ID
sigma^2.2 0.0517 0.2273 1294 no ES_ID
Test for Residual Heterogeneity:
QE(df = 1292) = 65896.5620, p-val < .0001
Test of Moderators (coefficient 2):
F(df1 = 1, df2 = 1292) = 0.8345, p-val = 0.3611
Model Results:
estimate se tval df pval ci.lb ci.ub
intrcpt 0.0267 0.0170 1.5668 1292 0.1174 -0.0067 0.0601
Year.c 0.0020 0.0021 0.9135 1292 0.3611 -0.0022 0.0062
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Figure S7
Bubble plot showing the relationship between the effect size magnitude
and publication year (centered on 2008). There is no temporal trend in
the changes of the effect size magnitude.
- Correcting for publication bias
Besides detecting publication bias, it is necessary to correct for such a bias to check the robustness of model estimates. The intercept \beta_{0} in Equation 17 can be interpreted as the publication-bias-corrected overall effect (‘true’ effect). Because \beta_{0} is the model coefficient when setting \frac {1} {\tilde{n}} = 0, which means the dataset has a infinite sample size (high precision) and thus has no small-study effect. To obtain publication-bias-corrected overall effect, we need to fit Equation 17 with:
mod_MLMR_lnRR_ess.var <- rma.mv(yi = lnRR, V = VCV, mods = ~ ess.var, # adding adjusted sampling error to obtain publication-bias-corrected overall effect - which is potentially regarded as 'true' effect. random = list(~1 | Study_ID, ~1 | ES_ID), method = "REML", test = "t", data = dat2_Midolo_2019, sparse = TRUE )
intrcpt
printed under Model Results
shows that intercept \beta_0
(-0.0029, 95% CIs = [-0.0659 to 0.0602]) is still statistically
non-significant (p-value = 0.9285), although the magnitude and
direction change. This suggests that the meta-analytic estimate of this
dataset ([1]) is robust to publication bias
(Table S4).
Multivariate Meta-Analysis Model (k = 1294; method: REML)
logLik Deviance AIC BIC AICc
-106.4500 212.8999 220.8999 241.5557 220.9310
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0113 0.1065 71 no Study_ID
sigma^2.2 0.0516 0.2272 1294 no ES_ID
Test for Residual Heterogeneity:
QE(df = 1292) = 65878.9167, p-val < .0001
Test of Moderators (coefficient 2):
F(df1 = 1, df2 = 1292) = 1.1843, p-val = 0.2767
Model Results:
estimate se tval df pval ci.lb ci.ub
intrcpt -0.0029 0.0321 -0.0897 1292 0.9285 -0.0659 0.0602
ess.var 0.1048 0.0963 1.0883 1292 0.2767 -0.0841 0.2937
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Table S4 Comparison of original meta-analytic estimates and bias-corrected meta-analytic estimates.
Accounting for heterogeneity when detecting publication bias
In our main text, We introduce Equation 19 to simultaneously detect two forms of publication bias while accounting for heterogeneity to increase power and reduce Type I error rate:
z_{i} = \beta_{0} + \beta_1\sqrt{\frac {1} {\tilde{n_i}}} +
\beta_2c(year_{j[i]}) + \sum \beta_{h}x_{h[i]} + \mu_{j[i]} + e_{i} +
m_{i}, (19)
To fit Equation 19 via rma.mv
, we only need to
put adapted sampling error (\beta_1\sqrt{\frac {1} {\tilde{n_i}}}),
publication year (c(year_{j[i]})
and other important study characteristics as moderator variables in a
meta-regression (e.g., multiple meta-regression; see section
Explaining variance with meta-regression). We
arbitrarily select elevation and trait as moderators
to demonstrate:
Multivariate Meta-Analysis Model (k = 1294; method: REML)
logLik Deviance AIC BIC AICc
-67.0224 134.0449 158.0449 219.9377 158.2904
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0099 0.0996 71 no Study_ID
sigma^2.2 0.0486 0.2204 1294 no ES_ID
Test for Residual Heterogeneity:
QE(df = 1284) = 63164.1842, p-val < .0001
Test of Moderators (coefficients 1:10):
F(df1 = 10, df2 = 1284) = 8.9597, p-val < .0001
Model Results:
estimate se tval df pval ci.lb ci.ub
ess.var 0.0119 0.0936 0.1275 1284 0.8985 -0.1717 0.1956
Year.c 0.0021 0.0021 1.0005 1284 0.3172 -0.0020 0.0061
elevation_log 0.0284 0.0100 2.8385 1284 0.0046 0.0088 0.0481 **
traitdC13 -0.1900 0.0707 -2.6891 1284 0.0073 -0.3286 -0.0514 **
traitLA -0.2446 0.0713 -3.4329 1284 0.0006 -0.3844 -0.1048 ***
traitLMA -0.1165 0.0710 -1.6399 1284 0.1013 -0.2559 0.0229
traitNarea -0.0962 0.0708 -1.3597 1284 0.1742 -0.2351 0.0426
traitNmass -0.0939 0.0699 -1.3432 1284 0.1795 -0.2311 0.0433
traitPmass -0.0150 0.0746 -0.2016 1284 0.8403 -0.1614 0.1313
traitSLA -0.2035 0.0698 -2.9160 1284 0.0036 -0.3404 -0.0666 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the follow model outputs, we see that the slopes of \sqrt{\frac {1} {\tilde{n_i}}} (\beta_1) and c(year_{j[i]} (\beta_2) still remain non-significant, albeit changes in magnitude, confirming that there is no publication bias.
Technical Recommendation
Note that when running complex models, some model parameters cannot
be estimated well. This is especially so for variance components.
Therefore, it is a good practice to check whether model parameters are
all identifiable, which can be checked using the profile()
in metafor
package:
profile(mod_MLMR_lnRR_whole, sigm2=1, progbar=FALSE) # check the estimates of between-study ($\sigma^2_1$ on the following figure) and within-study variance ($\sigma^2_2$)
From the variance component profiles, we know that their estimates are quite stable, as the likelihood goes down quickly after identifying the maximum value (vertical line). This basically means that the estimates of the model parameters are trustful. But this is not always the case for the complex meta-analytic models. Note that in this example, we used Quasi-Newton method to find the maximum (log)likelihood over parameters like variance components (REML does not have a close-form solution, so we have to use numerical optimization techniques).
- Sensitivity of assumption of within-study (sampling) correlation
Here, we show how to perform a sensitivity analysis to examine the extent to which the overall effect (e.g., \beta_{0}) is sensitive to the assumption of within-study (sampling) correlation \rho values used for constructing variance-covariance matrix (VCV).
First, set a series of \rho (i.e., 0.3, 0.5, 0.7, 0.9) (we assume these values arbitrarily; you can set them based on your expertise in your field):
rho_range <- c(0.3, 0.5, 0.7, 0.9)
Then, we write a function to help repeatedly run the specified model, changing \rho at a time:
ML_VCV_range <- list() # repeatedly run the specified model with varying rho for (i in 1:length(rho_range)) { VCV_range <- vcalc(vi = lnRRV, cluster = Study_ID, obs = ES_ID, rho = rho_range[i], data = dat2_Midolo_2019 ) # impute VCV matrix with variying rho ML_VCV_range[[i]] <- rma.mv(yi = lnRR, V = VCV_range, # VCV matrix with varying values of rho. random = list(~1 | Study_ID, ~1 | ES_ID), method = "REML", test = "t", data = dat2_Midolo_2019 )} # run model with different rho values.
From Table S5, we see that the overall effect \beta_{0} does not change with the changing of \rho values, indicating that the model estimates are robust to different assumption of \rho.
Table S5 Sensitivity analysis examining the robustness of the overall effect (i.e., \beta_0) to the assumption of \rho values.
- Leave-one-out analysis
The sensitivity (robustness of results) can be evaluated using leave-one-out analysis:repeatedly fitting the model to the dataset with one unit/entry (e.g., study) removed at a time. ‘One unit/entry’ denotes a row from the dataset fed to the model.
Leave-one-out analysis is a useful method to diagnose outlier and influential studies (which exert disproportionate influence on model estimates like the overall effect \beta_0 and its 95% CIs). As you might imagine, leave-one-out analysis re-fit the model N (the number of study) times. So it may take quite a long time to finish such a analysis if you have a large N.
For fixed and random effects models, the implementation of
leave-one-out analysis can be easily performed using the
leave1out()
function from package metafor
. The
syntax is just one line:
leave1out(mod_FE_lnRR)
Let’s have a look at the results of the first 10 rows:
estimate se zval pval ci.lb ci.ub Q Qp I2
1 -0.0144 0.0004 -34.0889 0.0000 -0.0152 -0.0135 41265.3873 0.0000 96.8690
2 -0.0144 0.0004 -34.0869 0.0000 -0.0152 -0.0135 41264.7609 0.0000 96.8690
3 -0.0144 0.0004 -34.0954 0.0000 -0.0152 -0.0135 41266.2873 0.0000 96.8691
4 -0.0144 0.0004 -34.0948 0.0000 -0.0152 -0.0135 41266.5291 0.0000 96.8691
5 -0.0143 0.0004 -34.0330 0.0000 -0.0152 -0.0135 41257.2615 0.0000 96.8684
6 -0.0143 0.0004 -33.9758 0.0000 -0.0151 -0.0135 41220.5405 0.0000 96.8656
7 -0.0144 0.0004 -34.0714 0.0000 -0.0152 -0.0135 41264.3804 0.0000 96.8690
8 -0.0143 0.0004 -34.0464 0.0000 -0.0152 -0.0135 41246.8031 0.0000 96.8676
9 -0.0142 0.0004 -33.6629 0.0000 -0.0150 -0.0134 41245.2609 0.0000 96.8675
10 -0.0147 0.0004 -34.8368 0.0000 -0.0155 -0.0139 41122.8168 0.0000 96.8582
H2
1 31.9392
2 31.9387
3 31.9399
4 31.9400
5 31.9329
6 31.9044
7 31.9384
8 31.9248
9 31.9236
10 31.8288
For our proposed multilevel model, no existing packages or functions
are ready to use. But you can modify the following code to implement
leave-one-out analysis in the framework of multilevel model. Later, we
will write a corresponding function and wrap it into
orchaRd
package. The syntax for using study as unit deleted
at a time is:
dat2_Midolo_2019$leave_out <- as.factor(dat2_Midolo_2019$study_name) # create the variable that will be left out leave1out_ES <- list() # create a list to contain model estimates for(i in 1:length(levels(dat2_Midolo_2019$leave_out))){ # create the data with one study removed at a time dat <- dat2_Midolo_2019 %>% filter(leave_out != levels(dat2_Midolo_2019$leave_out)[i]) # repeatedly run the multilevel model, leaving out one study at a time VCV_leave1out <- list() VCV_leave1out[[i]] <- impute_covariance_matrix(vi = dat$lnRRV, cluster = dat$Study_ID, r = 0.5) # create a list of VCV matrices for following model fitting leave1out_mod[[i]] <- rma.mv(yi = lnRR, V = VCV_leave1out[[i]], random = list(~1 | Study_ID, ~1| ES_ID), method = "REML", test = "t", data = dat, sparse = TRUE)} # model fitting est.func <- function(mod){ df <- data.frame(est = mod$b, lower = mod$ci.lb, upper = mod$ci.ub) return(df) } # write a simple function to extract intercept and 95% CIs from each of the above fitted models leave1out_results <- lapply(leave1out_mod, function(x) est.func(x)) %>% bind_rows %>% mutate(left_out = levels(dat2_Midolo_2019$leave_out)) # turn the model estimates in the format of data frame
The results of leave-one-out analysis are shown on Figure S8, where do not observe any outliers or influential studies obviously having disproportionate effects on the model estimates.
Figure S8 Leave-one-out analysis showing overall effects \beta_0 and 95% CIs based on dataset with one study left out at a time from model fitting.
The distribution of overall effect (\beta_0) after deleting one study at a time also confirms no obvious outlier (Figure S9).
Figure S9 Leave-one-out analysis showing distribution of overall effects \beta_0 based on dataset with one study left out at a time from model fitting.
There are also other diagnostic indices can be calculated from leave-one-out analysis, for example, Cook’s distance, the difference between the regression coefficients (DFBETAS), and the hat value:
Cook’s distance = the Mahalanobis distance between the predicted (average) effect based on the full dataset and those based on the dataset with one study excluded from the model fitting. A study may be considered as ‘outlier’ if the lower-tail area of a chi-square distribution (with p degrees = the number of model coefficients of freedom) cut off by the Cook’s distance is more than 50%.
DFBETAS = changes in the predicted (average) effect for a specific study (quantified as standard deviations) after excluding the specific study from the model fitting. A study may be considered as ‘outlier’ the value of is more than 1.
Hat value = the diagonal elements of the hat matrix that can transform the vector of observed effect into the vector of predicted. Halt value can determine the magnitude of residual based on dataset with a deleted study and therefore can help identify outlying studies. A study may be considered as ‘outlier’ if the Halt value is more than 3(\frac {p} {N}) (where N = the number of studies).
There is point worth noting - these indices can be easily computed in
R
, but the above cut-offs are arbitrary. Therefore, if you
want to use these indices, you really need informed judgments. Below we
show how to calculate these indices.
Cook’s distance can be calculated via
cooks.distance()
:
cooks.distance(mod_ML_lnRR_VCV, cluster = Study_ID)
Figure S10 Cook’s distance showing how much all of the predicted effects in the model change when one study is deleted.
The calculation of DFBETAS and halt values can be done with
dfbetas()
and hatvalues()
, respectively. The
syntax is just as simple as that in cooks.distance()
. Given
the extensive computation these methods will take (our example dataset
is very large), we do not show the results.
- Quantitative critical appraisal
Besides the qualitative methods to critically appraise the risk of
bias of primary studies included in a meta-analysis, we recommend the
quantitative methods. When compiling dataset, you need to extract and
code variables related risk-of-bias (RoB) characteristics, for
example, the employment of blinding, randomization, and selective
reporting. Then you can code these variables as moderators to be
incorporated into a meta-regression. Unfortunately, at present, no
environmental meta-analyses (at least our surveyed papers) have coded
relevant moderators. Therefore, we are unable to provide an example to
show implementation. The implementation is straightforward - you just
need to use mods
argument in rma.mv()
function
and supply RoB variables to the mods
using a
formula: mods = ~ RoB
.
Our team (Shinichi Nakagawa) has published a paper that examined whether the selective reporting inflates the effect size estimates:
Parker T H, Greig E I, Nakagawa S, et al. Subspecies status and methods explain strength of response to local versus foreign song by oscine birds in meta-analysis[J]. Animal Behaviour, 2018, 142: 1-17.
In this paper, we ran a meta-regression with selective reporting as a moderator and found that selective reporting inflated the effect size estimates. This paper exactly demonstrates the need for transparent reporting & open archiving that we contend in our main text. We refer to the relevant code archived at Open Science Framework to reproduce this quantitative critical appraisal if you think you are capable of conducting such an analysis with our illustration.
Other relevant and advanced issues
According to our survey, many environmental meta-analyses have the issues of missing values for some of the primary studies:
missing standard deviations or sample sizes and associated with means, making effect size calculations in-feasible;
missing descriptions of study characteristics related variables, making coding of moderator variables impossible.
A common approach to deal these issues is to delete the relevant studies. but this will reduce the statistical power and precision of model coefficients (e.g., \beta_0 and \beta_1) due to the loss of sample size (in this case, the number of deleted studies), and limit the ability to reveal the drivers of heterogeneity between effect sizes due the loss of explanatory variables (i.e., moderators).
Here, we show how to impute missing standard deviation, which is a common phenomenon in environmental meta-analyses. We will introduce two readily implementable techniques to impute standard deviation. Both methods are (partial) based on mean-variance relationship (strong correlation between standard deviation and mean).
The first method is simple multiple imputation, which imputes standard deviation using the coefficient of variation (CV) from all complete cases but uses resampling method to account for uncertainty when using CV from complete information;
The second method is an improved method for imputing standard deviations of lnRR, which imputes standard deviation using weighted average CV to improve the precision of sampling variance estimates. The reason why we introduce an imputation method for lnRR is because lnRR is the most commonly used effect size statistic in environmental meta-analyses (details see survey results in the main text).
Before demonstrating this method, we must to artificially create incomplete standard deviations in our dataset. we randomly select 20% of the studies and delete their standard deviations from both control (sd_control) and treatment groups (sd_treatment).
missing_SD <- dat2_Midolo_2019 # copy the dataset into missing_SD, which is used as the dataset for illustration of imputation method set.seed(2022) # set the seed (for the random number generator) to make the following results fully reproducible; it is good practice to set the seed to make our results fully reproducible stdies <- sample(unique(missing_SD$study_name), size = 0.2*(length(unique(missing_SD$study_name)))) # randomly sample 20% of studies missing_SD[which(missing_SD$study_name %in% stdies), c("sd_treatment", "sd_control")] <- NA # create missingness of SD at the study level
Let’s have a look at the summary of artificially-created missingness in the dataset:
Summary of missingness:
COLUMN PERCENT_MISSINGNESS IMPUTATIONS
Study_ID 0 0
study_name 0 0
species 0 0
trait 0 0
treatment 0 0
control 0 0
sd_treatment 12 159
sd_control 12 159
n_treatment 0 0
n_control 0 0
elevation 0 0
elevation_log 0 0
lnRR 0 0
lnRRV 0 0
ES_ID 0 0
ess.var 0 0
ess.se 0 0
n_tilde 0 0
Year 0 0
Year.c 0 0
leave_out 0 0
Total missingness: 1.2% (318 imputations needed)
COLUMN PERCENT_MISSINGNESS IMPUTATIONS
Study_ID Study_ID 0.00000 0
study_name study_name 0.00000 0
species species 0.00000 0
trait trait 0.00000 0
treatment treatment 0.00000 0
control control 0.00000 0
sd_treatment sd_treatment 12.28748 159
sd_control sd_control 12.28748 159
n_treatment n_treatment 0.00000 0
n_control n_control 0.00000 0
elevation elevation 0.00000 0
elevation_log elevation_log 0.00000 0
lnRR lnRR 0.00000 0
lnRRV lnRRV 0.00000 0
ES_ID ES_ID 0.00000 0
ess.var ess.var 0.00000 0
ess.se ess.se 0.00000 0
n_tilde n_tilde 0.00000 0
Year Year 0.00000 0
Year.c Year.c 0.00000 0
leave_out leave_out 0.00000 0
Below let’s show the implementation one by one.
- Simple multiple imputation
Multiple imputation methods are generally complex, and a full
description is beyond the scope of our paper. Many multiple imputation
methods can be implemented via a powerful package mice
. If
you have interests in these advanced methods, you can look at the
documentation of (help(mice)
). Here, we introduce a simple
multiple imputation method, which is based on the strong correlation
between standard deviation and mean values and resort to resampling
approach to make the imputed SD get rid of the uncertainty of imputation
itself, such that the Type I error rate can be reduced. This method can
be implemented via impute_SD()
function in
metagear
package. The syntax for this is:
dat_simple <- impute_SD(missing_SD, columnSDnames = c("sd_treatment", "sd_control"), # a string or list containing the labels of the column(s) with missing SD; columnXnames = c("treatment", "control"), a string or list containing the labels of the column(s) with mean values for each SD. method = "HotDeck")
Summary of missingness:
COLUMN PERCENT_MISSINGNESS IMPUTATIONS
Study_ID 0 0
study_name 0 0
species 0 0
trait 0 0
treatment 0 0
control 0 0
sd_treatment 0 0
sd_control 0 0
n_treatment 0 0
n_control 0 0
elevation 0 0
elevation_log 0 0
lnRR 0 0
lnRRV 0 0
ES_ID 0 0
ess.var 0 0
ess.se 0 0
n_tilde 0 0
Year 0 0
Year.c 0 0
leave_out 0 0
Total missingness: 0% (0 imputations needed)
COLUMN PERCENT_MISSINGNESS IMPUTATIONS
Study_ID Study_ID 0 0
study_name study_name 0 0
species species 0 0
trait trait 0 0
treatment treatment 0 0
control control 0 0
sd_treatment sd_treatment 0 0
sd_control sd_control 0 0
n_treatment n_treatment 0 0
n_control n_control 0 0
elevation elevation 0 0
elevation_log elevation_log 0 0
lnRR lnRR 0 0
lnRRV lnRRV 0 0
ES_ID ES_ID 0 0
ess.var ess.var 0 0
ess.se ess.se 0 0
n_tilde n_tilde 0 0
Year Year 0 0
Year.c Year.c 0 0
leave_out leave_out 0 0
We see that all the missing standard deviations (in both control and treatment groups) are imputed.
- Improved imputation
Our team (Shinichi Nakagawa and Malgorzata Lagisz) recently developed a tailored method for imputing missing standard deviations in lnRR. Our heavy simulation indicates that this method outputs the simple method based on the mean-variance relationship. Details see the following paper:
Nakagawa, S., Noble, D. W., Lagisz, M., Spake, R., Viechtbauer, W., & Senior, A. M. (2022, May 19). A robust and readily implementable method for the meta-analysis of response ratios with and without missing standard deviations. https://doi.org/10.32942/osf.io/7thx9
We assume this method will help environmental meta-analysts a great deal because lnRR is the mostly used effect size statistic in the field. In brief, this method is based on the weighted average CV rather than a simple average of CV. Using weighted average CV will give some data points more weight than others, such that the precision of sampling variance estimates can be improved. Below, we borrow two custom functions we wrote for the above paper to show implementation.
cv_avg()
= compute the weighted average (square) CV
within a study and the weighted average (square) CV between studies
lnrr_laj()
= compute the point estimate of lnRR based on
Taylor expansion
v_lnrr_laj()
= compute the sampling variance for lnRR
based on second order Taylor expansion
The full code is give below:
dat_improved <- missing_SD %>% mutate(cv_control = na_if(sd_control / control, Inf), cv_treatment = na_if(sd_treatment / treatment, Inf)) # first calculate CV on dataset with missing SDs wherein missing SD will be ignored when calculating dat_improved <- cv_avg(x = control, sd = sd_control, n = n_control, group = study_name, label = "1", # control group data = dat_improved) # calculate the average between-study CV, which will replace missing SD dat_improved <- cv_avg(x = treatment, sd = sd_treatment, n = n_treatment, group = study_name, label = "2", # treatment group data = dat_improved) dat_improved <- dat_improved %>% mutate(cv2_control_new = if_else(is.na(cv_control), b_CV2_1, cv_control^2), cv2_treatment_new = if_else(is.na(cv_treatment), b_CV2_2, cv_treatment^2)) # use weighted between-study CV to replace missing CV (which is due to missing SD) dat_improved <- dat_improved %>% mutate(lnRR_new = lnrr_laj(m1 = control, m2 = treatment, cv1_2 = cv2_control_new, cv2_2 = cv2_treatment_new, n1 = n_control, n2 = n_treatment), lnRRV_new = v_lnrr_laj(cv1_2 = cv2_control_new, n1= n_control, cv2_2 = cv2_treatment_new, n2 = n_treatment)) # compute the new point estimate of lnRR and and its sampling variance, respectively. This uses either the between-individual CV^2 when missing or normal CV^2 when not missing
We summarize the model estimates based the above two imputation methods in Table S6.
Table S6 Comparison of estimates of model coefficients based on datasets with two simple imputation method and improved version.
- Multiple sources of non-independence
Environmental meta-analytic datasets may have more many ways for studies to be non-independent from one another. Several ‘clustering variables’, other than study identity, may identify studies as dependent. As you could probably imagine, genus (species) could be such a clustering variable in an environmental dataset with multiple species included, as effect sizes derived from the same species are probably more similar to each other than effect sizes from different species due to similar genetics, shared history or phylogenetic relatedness. The multilevel model has a flexible random effects structure handling complex non-independence like, for example, taxonomic dependence, by adding associated clustering variables (e.g.,species ) as different levels of random effects:
z_{i} = \beta_{0} + a_{k[i]} + s_{k[i]} + \mu_{j[i]} + e_{i} + m_{i},
(19)
Argument random
in rma.mv
have an
elegant solution to construct random effects structure. For example, if
we want to account for both study and species (taxonomic) effects, we
need to supply them via argument random
and use
list()
to bind all these clustering variables together:
random = list(~ 1 | species, ~ 1 | Study_ID, ~ 1 | ES_ID)
.
The complete code will be:
rma.mv(yi = lnRR, V = VCV, random = list(~1 | species, # add species identity as a random effect, which allows effect sizes vary between species; ~1 | Study_ID, # add study identity as a random effect, which allows effect sizes vary between studies; ~1 | ES_ID), # add effect size as a random effect, which allows effect sizes vary within studies. method = "REML", test = "t", data = dat2_Midolo_2019 )
The output is:
Multivariate Meta-Analysis Model (k = 1294; method: REML)
logLik Deviance AIC BIC AICc
-107.3514 214.7028 222.7028 243.3617 222.7338
Variance Components:
estim sqrt nlvls fixed factor
sigma^2.1 0.0000 0.0064 109 no species
sigma^2.2 0.0113 0.1064 71 no Study_ID
sigma^2.3 0.0516 0.2272 1294 no ES_ID
Test for Heterogeneity:
Q(df = 1293) = 65913.4258, p-val < .0001
Model Results:
estimate se tval df pval ci.lb ci.ub
0.0268 0.0170 1.5730 1293 0.1160 -0.0066 0.0601
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We see that results given under Variance Components show variance components for each of 3 random effects, although there seems not heterogeneity between species.
Note
According to our survey, only a few (XX%) environmental meta-analyses used multilevel model. Therefore, environmental meta-analysts may be not familiar with ‘random effects’? Many researchers have attempted to define it ( see Andrew Gelman’s nice summary ). Here, we give an intuitive explanation in the context of a environmental meta-analysis:
When a variable in a meta-analytic model is specified as a random effect, we believe that it contributes noise (variation) to the overall mean and thus has a random effect on the overall mean. For example, treating ‘species’ as a random effect assumes that the true effects are heterogeneous within species and enables us to quantify how much variation there is among species. On the other hand, treating species as a fixed effect means that species levels are identical across different studies and thus have a systematic effect on the overall effect; this is equivalent to ask: do one species responds more to an environmental stressor than others?).
- select a ‘best’ random effects structure
Here, we introduce some rules of thumb to help decide how to specify vrandom effects structures to capture the hierarchical data structure (e.g., non-independence due to ‘clustering’, ‘nesting’, or ‘crossing’), which may be relevant to your meta-analysis.
Rule 1
Consider whether the variable in question is a true source of heterogeneity according to your knowledge in your field - we are against solely using data-driven random effects structure without considering whether the tested variable has true heterogeneity.
Rule 2
A random effect should have > 5 levels, such that the variance to obtain approximately unbiased estimates.
Rule 3
Examining whether specifying the random effect structure improves model for, such as by examining fit statistics (e.g., AIC and BIC), goodness-of-fit (e.g., R^2_{marginal}).
Study (Study_ID) and effect size identities (ES_ID)
are typical random effects for an environmental meta-analysis. Let’s use
[1]‘s dataset empirically test this point
and show how to select ’best’ random effects structure using
information-theoretic approaches alongside likelihood ratio tests
(implemented via function anova.rma()
). There are three
random effects candidates:
Effect size identity (ES_ID) - unique ID for each pairwise comparison for effect size calculation.
Study identity (Study_ID) - unique ID for each included primary study.
Species identity (species) - name of species included in the primary studies.
Let’s first fit a null model without including the above random effects candidates as the default reduced model:
null.re <- rma.mv(yi = lnRR, V = VCV, method = "ML", # Setting method = "ML" rather than "REML" when conduct model selection. test = "t", data = dat2_Midolo_2019 )Then, add study identity (Study_ID) as a random effect via argument
random
:
study <- rma.mv(yi = lnRR, V = VCV, random = ~1 | Study_ID, method = "ML", test = "t", data = dat2_Midolo_2019 )Let’s conduct a likelihood ratio test to compare the two fitted models’ quality in terms of AIC, BIC, AICc, log-likelihood values:
anova.rma(null.re, study)
df AIC BIC AICc logLik LRT pval
Full 2 47642.9931 47653.3240 47643.0024 -23819.4965
Reduced 1 60148.2005 60153.3660 60148.2036 -30073.1003 12507.2075 <.0001
QE
Full 65913.4258
Reduced 65913.4258
We see that model (Full
) with Study_ID as a
random effect has a lower AIC value, in contrast to the null model
(Reduced
). The log-likelihood ratio test indicates that
Study_ID significantly improve model fit (pval
=
< 0.0001). Note that when we recommend to use maximum likelihood (ML)
rather than restricted maximum likelihood (REML) when using information
criteria, as using likelihood-based methods (and hence information
criteria) to compare models having different fixed effects that are
fitted by REML will generally yield nonsense. This is an important point
when doing model selection (see next section). It is therefore important
to remember to set methods = "ML"
rather than
methods = "REML"
when comparing models with different
random effects (and also fixed effects; see next section).
In the same vein, we can examine whether ES_ID contributes to model fit:
es <- rma.mv(yi = lnRR, V = VCV, random = ~1 | ES_ID, method = "ML", test = "t", data = dat2_Midolo_2019 )
anova.rma(null.re, es)
df AIC BIC AICc logLik LRT pval
Full 2 270.8693 281.2003 270.8786 -133.4347
Reduced 1 60148.2005 60153.3660 60148.2036 -30073.1003 59879.3312 <.0001
QE
Full 65913.4258
Reduced 65913.4258
From AIC
, LRT
and pval
given
in the above output, we see that model with ES_ID as a random
effect (Full
) is much better than model with without random
effect (Reduced
). Therefore, effect size identity is an
important random effect that our model should account for.
study.es <- rma.mv(yi = lnRR, V = VCV, random = list(~1 | Study_ID, ~1 | ES_ID), method = "ML", test = "t", data = dat2_Midolo_2019 )
anova.rma(study.es, es)
df AIC BIC AICc logLik LRT pval QE
Full 3 221.5487 237.0451 221.5673 -107.7743 65913.4258
Reduced 2 270.8693 281.2003 270.8786 -133.4347 51.3207 <.0001 65913.4258
As expected, the full model has a smaller AIC value than that of
reduced model, indicating that model defining the nested random effects
(i.e., accounting for non-independence) structure using
Study_ID and ES_ID has a better model fit
(Full), compared to model ignoring non-independence
(Reduced). Next, lets’ explore whether animal species identity
is an important random effect. First add the coded variable
species as a random effect term via argument
random
:
species.study.es <- rma.mv(yi = lnRR, V = VCV, random = list(~1 | species, ~1 | Study_ID, ~1 | ES_ID), method = "ML", test = "t", data = dat2_Midolo_2019 )
anova.rma(species.study.es,study.es)
df AIC BIC AICc logLik LRT pval QE
Full 4 223.5484 244.2103 223.5794 -107.7742 65913.4258
Reduced 3 221.5487 237.0451 221.5673 -107.7743 0.0003 0.9864 65913.4258
Looking at the above output, we see that adding species as a random effect does not change the AIC value. This suggests that intraspecific leaf traits are consistent across plant species; that there is only a small amount of heterogeneity among species. This is easily corroborated when calculating I^2 at species level:
I2_Total I2_species I2_Study_ID I2_ES_ID
99.63048070 0.04486139 17.45782398 82.12779532
As said in our main text, model selection is a powerful method to:
quantify the importance of moderators in explaining heterogeneity, which is useful when look for, for example, global drivers of environmental changes;
multimodel inference, which can make model inferences about the moderators in the context of models with all possible combinations of moderators rather than a single ‘best’ model;
multimodel predictions, which can predict (average) effects of a moderator and its CIs based on models with all possible combinations of moderators rather than a single ‘best’ model.
Below we show how to conduct model selection and multimodel inference
using an information-theoretic approach (although R^2 based model selection is a reasonable
option, it is not preferable here). To do so, metafor
package needs to borrow functionality from model-selection-dedicated
packages like MuMIn
and glmulti
. We will use
MuMIn
package for illustration because the corresponding
syntax is more straightforward and simpler than that of
glmulti
. Let’s stay with the dataset of [1]. This dataset only has two
moderators, elevation and trait, so we include an
additional term - their interaction - for illustration of a more complex
model. We would like to emphasize one point again - in your
meta-analysis, you need to need to select your moderators based on their
a priori plausibility (predefined environmental questions you
are going to address in your analyses) rather than including many more
moderators or removing them until you get a significant model.
- Selecting a ‘best5’ model
By ‘best model’, we mean the acceptable amount of information loss when we use a fitted model to approximate the real data generating mechanism. To select the best model, first, we need fit a multilevel multi-moderator meta-analytic model with all plausible moderators (Equation 19; full model). Then, dredge the full model to produce models with all possible combinations of moderators from to the full model (note that you need to use maximum likelihood (ML) rather than restricted ML; see early section for explanation):
mod.full <- rma.mv(yi = lnRR, V = lnRRV, mods = ~ trait * elevation_log, random = list(~ 1 | Study_ID, ~ 1 | ES_ID), method="ML", test = "t", data = dat2_Midolo_2019) # fit Equation 19 - multilevel multi-moderator meta-analytic model (full model with all plausible moderators). eval(metafor:::.MuMIn) # use eval() function to extract helper functions from MuMIn and make them usable in metafor. mod.candidate <- dredge(mod.full, beta = "none", evaluate = TRUE, rank = "AICc", trace=2) # dredge to produce all possible models
|
| | 0%
|
|========= | 12%
|
|================== | 25%
|
|========================== | 38%
|
|============================================================= | 88%
From Table S7, We see the fit statistics (log likelihood and AICc) from all possible models.
Table S7 Fit statistics of models with different combinations of moderators included in the full model (Equation 19 - multilevel multi-moderator meta-analytic model).
Next step, let’s select a sets of best model. This can be done by using thumb of rules, for example, delta AICc <= 2:
subset(mod.candidate, delta <= 2, recalc.weights = FALSE)
Global model call: rma.mv(yi = lnRR, V = lnRRV, mods = ~trait * elevation_log, random = list(~1 |
Study_ID, ~1 | ES_ID), data = dat2_Midolo_2019, method = "ML",
test = "t", sparse = TRUE)
---
Model selection table
(Int) elv_log trt elv_log:trt df logLik AICc delta weight
8 + -0.01991 + + 16 -74.686 181.8 0 1
Models ranked by AICc(x)
- Model average
We also can make multi-model inference the model average. You might still remember that in the early section, all model inference (test of model slopes, e.g., \beta_1) base the classic null hypothesis testing. The advantage of model averaging is that it can take into account the “weights” from multiple plausible models. “Weights” is the “Akaike weights” that can be considered as the possibility that a specific model has the least information loss (see the above explanation). Put differently, you can treat weights as model probabilities. Note that this dataset is not a good example to illustrate model average because the weight in the first best model is almost 1 and that in other best models are 0 (look at the last column of Table S6). Unfortunately, we have very limited datasets from our survey due to the low data sharing rate in the environmental sciences. Anyway, the logic and syntax are the same:
summary(model.avg(mod.candidate))
Call:
model.avg(object = mod.candidate)
Component model call:
rma.mv(yi = lnRR, V = lnRRV, mods = ~<5 unique rhs>, random = list(~1 |
Study_ID, ~1 | ES_ID), data = dat2_Midolo_2019, method = ML, test = t,
sparse = TRUE)
Component models:
df logLik AICc delta weight
123 16 -74.69 181.80 0.00 1
12 10 -92.66 205.50 23.70 0
2 9 -95.86 209.86 28.06 0
1 4 -135.48 279.00 97.20 0
(Null) 3 -138.73 283.48 101.68 0
Term codes:
elevation_log trait elevation_log:trait
1 2 3
Model-averaged coefficients:
(full average)
Estimate Std. Error z value Pr(>|z|)
intrcpt 0.115566 0.141088 0.819 0.412727
elevation_log -0.019907 0.022636 0.879 0.379164
traitLA -0.097898 0.204791 0.478 0.632624
traitLMA -0.353229 0.188036 1.879 0.060310 .
traitNarea -0.468677 0.185908 2.521 0.011701 *
traitNmass -0.244480 0.165871 1.474 0.140505
traitPmass -0.888061 0.248237 3.577 0.000347 ***
traitSLA 0.131673 0.198118 0.665 0.506294
elevation_log:traitLA 0.003795 0.033306 0.114 0.909281
elevation_log:traitLMA 0.066417 0.030266 2.194 0.028204 *
elevation_log:traitNarea 0.088490 0.029835 2.966 0.003017 **
elevation_log:traitNmass 0.055879 0.026963 2.072 0.038222 *
elevation_log:traitPmass 0.182724 0.041437 4.410 1.04e-05 ***
elevation_log:traitSLA -0.025019 0.032394 0.772 0.439920
(conditional average)
Estimate Std. Error z value Pr(>|z|)
intrcpt 0.115566 0.141088 0.819 0.412727
elevation_log -0.019907 0.022636 0.879 0.379163
traitLA -0.097898 0.204791 0.478 0.632624
traitLMA -0.353229 0.188036 1.879 0.060310 .
traitNarea -0.468677 0.185908 2.521 0.011701 *
traitNmass -0.244480 0.165871 1.474 0.140505
traitPmass -0.888061 0.248237 3.577 0.000347 ***
traitSLA 0.131673 0.198118 0.665 0.506294
elevation_log:traitLA 0.003795 0.033306 0.114 0.909281
elevation_log:traitLMA 0.066417 0.030266 2.194 0.028201 *
elevation_log:traitNarea 0.088491 0.029834 2.966 0.003016 **
elevation_log:traitNmass 0.055879 0.026962 2.073 0.038218 *
elevation_log:traitPmass 0.182726 0.041434 4.410 1.03e-05 ***
elevation_log:traitSLA -0.025019 0.032394 0.772 0.439918
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The above output is showing all the estimates of model coefficients
for moderators based on model average procedure. It is a typical format
to show model estimates in R
. The interpretation is quite
easy. We also make a table (Table S8) showing results
of model average based on glmulti
for comparison. We see
that the results are exactly the same.
Table S8 The estimates of model slopes for moderators based on the model average inference method.
Scale dependence is a widespread issue in the field of environmental sciences. Yet, it rarely has been accounted for in the current practice of environmental meta-analysis. A common type of scale dependence arises when the sampling unit varies in size, for example a quadrat or plot used to measure biodiversity can vary in size (e.g., 100 cm^2 or 1 km^2). Smaller plot sizes underestimate biodiversity differences, and effect size statistics that estimate them. However, studies that use smaller plots tend to have greater replication than studies using larger plots. The scale dependence of any meta-analytic inference arises due to a relationship among replicates (sample size), plot size and sampling variance \nu_i, which leads to covariation among the three. Basically, scale dependence will lead to biased estimates of sampling variance, which violates a basic assumption of meta-analysis; sampling variance is known because they (partially) serve as weights to average effect sizes across studies.
There are no established solutions on this issue. Incidentally, our team (Rebecca Spake) has a comprehensive simulation paper investigating the scale dependence in the context of meta-analytic model:
Spake R, Mori A S, Beckmann M, et al. Implications of scale dependence for cross‐study syntheses of biodiversity differences[J]. Ecology Letters, 2021, 24(2): 374-390.
Based on the results of this paper and our experience, we give three practical and easy-to-implement suggestions to mitigate the issue of scale dependence:
- use lnRR rather than SMD as the effect size statistic to increase the accuracy of effect size and sampling variance estimates in terms of measurement of biodiversity, richness rather or species density. Visually assess relationships among effect size estimates, plot size, replication and variance.
- conduct sensitivity analysis via, for example, fitting an unweighted model or using a different weighting scheme (e.g., weighting by an ordinal classification of study quality or non-parametric weighting) to get rid of the impact of inaccurate sampling variances; if using unweighted model, your dataset should be free of publication bias, otherwise you will get biased model coefficients
- code the size of plot as a moderator and use meta-regression to control for its covariate effects on other moderators effects
Unfortunately, our survey did not find any dataset have coded plot
size as a moderator (Plot.Size). Therefore, we are unable to
provide a corresponding illustration. But remember the implementation is
straightforward - you just need to use mods
argument in
rma.mv()
function and supply Plot.Size variable to
the mods
using a formula:
mods = ~ Plot.Size
.
License
This documented is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International.
Software and package versions
R version 4.0.3 (2020-10-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale: _LC_COLLATE=Chinese (Simplified)China.936, _LC_CTYPE=Chinese (Simplified)China.936, _LC_MONETARY=Chinese (Simplified)China.936, LC_NUMERIC=C and _LC_TIME=Chinese (Simplified)China.936
attached base packages: grid, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: formatR(v.1.11), pander(v.0.6.4), gridGraphics(v.0.5-1), png(v.0.1-7), cowplot(v.1.1.1), ggthemr(v.1.1.0), ggalluvial(v.0.12.3), visdat(v.0.5.3), ggsignif(v.0.6.3), plotly(v.4.9.4.1), networkD3(v.0.4), GoodmanKruskal(v.0.0.3), mice(v.3.13.0), patchwork(v.1.1.1), metagear(v.0.7), glmulti(v.1.0.8), leaps(v.3.1), rJava(v.1.0-4), MuMIn(v.1.43.17), clubSandwich(v.0.5.3), metafor(v.3.9-21), numDeriv(v.2016.8-1.1), metadat(v.1.2-0), Matrix(v.1.5-3), readxl(v.1.3.1), ggpubr(v.0.4.0), DT(v.0.19), here(v.1.0.1), forcats(v.0.5.2), stringr(v.1.4.0), dplyr(v.1.0.10), purrr(v.0.3.4), readr(v.2.1.2), tidyr(v.1.2.1), tibble(v.3.1.8), ggplot2(v.3.4.0), tidyverse(v.1.3.1), palmerpenguins(v.0.1.0), rmdformats(v.1.0.3) and knitr(v.1.37)
loaded via a namespace (and not attached): backports(v.1.2.1), plyr(v.1.8.6), igraph(v.1.3.0), lazyeval(v.0.2.2), splines(v.4.0.3), crosstalk(v.1.1.1), TH.data(v.1.1-0), digest(v.0.6.27), htmltools(v.0.5.2), fansi(v.0.5.0), magrittr(v.2.0.3), tzdb(v.0.1.2), openxlsx(v.4.2.4), modelr(v.0.1.8), sandwich(v.3.0-1), colorspace(v.2.0-0), rvest(v.1.0.1), haven(v.2.4.3), xfun(v.0.29), tcltk(v.4.0.3), crayon(v.1.4.1), jsonlite(v.1.7.2), survival(v.3.2-7), zoo(v.1.8-9), glue(v.1.6.2), gtable(v.0.3.0), emmeans(v.1.6.3), car(v.3.0-11), abind(v.1.4-5), scales(v.1.2.1), mvtnorm(v.1.1-3), DBI(v.1.1.1), rstatix(v.0.7.0), Rcpp(v.1.0.8.3), viridisLite(v.0.4.0), xtable(v.1.8-4), foreign(v.0.8-81), latex2exp(v.0.9.4), stats4(v.4.0.3), htmlwidgets(v.1.5.3), httr(v.1.4.2), ellipsis(v.0.3.2), farver(v.2.1.0), pkgconfig(v.2.0.3), sass(v.0.4.0), dbplyr(v.2.1.1), utf8(v.1.2.2), labeling(v.0.4.2), tidyselect(v.1.1.1), rlang(v.1.0.6), munsell(v.0.5.0), cellranger(v.1.1.0), tools(v.4.0.3), cli(v.3.4.1), generics(v.0.1.0), pacman(v.0.5.1), broom(v.1.0.1), mathjaxr(v.1.2-0), evaluate(v.0.14), fastmap(v.1.1.0), yaml(v.2.2.1), fs(v.1.5.2), zip(v.2.2.0), pbapply(v.1.4-3), nlme(v.3.1-151), xml2(v.1.3.2), compiler(v.4.0.3), rstudioapi(v.0.13), beeswarm(v.0.4.0), curl(v.4.3.2), reprex(v.2.0.1), bslib(v.0.3.0), stringi(v.1.7.4), highr(v.0.9), lattice(v.0.20-41), vctrs(v.0.5.0), pillar(v.1.8.1), lifecycle(v.1.0.3), jquerylib(v.0.1.4), estimability(v.1.3), data.table(v.1.14.0), R6(v.2.5.1), bookdown(v.0.24), rio(v.0.5.29), vipor(v.0.4.5), codetools(v.0.2-18), MASS(v.7.3-54), assertthat(v.0.2.1), rprojroot(v.2.0.2), withr(v.2.5.0), multcomp(v.1.4-17), parallel(v.4.0.3), mgcv(v.1.8-33), hms(v.1.1.0), coda(v.0.19-4), rmarkdown(v.2.11), carData(v.3.0-4), lubridate(v.1.7.10) and ggbeeswarm(v.0.6.0)