Quantitative evidence synthesis: a practical guide on meta-analysis, meta-regression, and publication bias tests for environmental sciences

A step-by-step tutorial along with theoretical explanation

Contributors

Shinichi Nakagawa, Yefeng Yang, Erin Macartney, Rebecca Spake, and Malgorzata Lagisz

Update

Last update January 2023

Preface

This step-by-step tutorial is a supplement to our methodological guideline paper invited by Environmental Evidence - the official journal of the Collaboration for Environmental Evidence (CEE).

Intended audience

Our tutorial is aimed at researchers with an interest in using meta-analytical techniques from any discipline and level of prior experience with meta-analysis. While our worked examples use questions and data from the environmental sciences, our guidelines and R tutorial contain explanations and point to the relevant statistical theory, so that the techniques can be applied in other scientific disciplines and statistical software (e.g., Python or Stata).

We will update this tutorial when necessary. Readers can access the latest version in our GitHub repository

Credit and citation

This tutorial integrates much of the work and prior R scripts from researchers in Prof. Shinichi Nakagawa’s lab (see full publication list) and from (Associate) Prof. Wolfgang Viechtbauer’s versatile R package metafor (see the documentation, GitHub page, and package webpage).

We encourage the adoption of Open Science practices in the environmental science to make the field more open, reliable, and transparent. For example, through the sharing of data and code (for more information, see O’Dea et al. 2021).

If our paper and tutorial have helped you, please cite the following paper:

Shinichi Nakagawa, Yefeng Yang, Erin Macartney, Rebecca Spake, and Malgorzata Lagisz. Quantitative synthesis: a practical guide to meta-analysis, meta-regression, and publication bias tests for environmental sciences. Environmental Evidence, 2023.

Contact

If you have any questions, mistakes, or bug to report, please contact corresponding authors:

Dr. Yefeng Yang

Evolution & Ecology Research Centre, EERC School of Biological, Earth and Environmental Sciences, BEES The University of New South Wales, Sydney, Australia

Email: yefeng.yang1@unsw.edu.au

Professor Shinichi Nakagawa, PhD, FRSN

Evolution & Ecology Research Centre, EERC School of Biological, Earth and Environmental Sciences, BEES The University of New South Wales, Sydney, Australia

Email: s.nakagawa@unsw.edu.au

Setup `R` in your computer

Our tutorial uses R statistical software and existing R packages, which you will first need to download and install.

If you do not have it on your machine, first install R (download). We recommend also downloading RStudio, a popular integrated development environment for coding with R, created by a company named posit (download).

After installing R, you must install several packages that contain necessary functions for performing the analyses in this tutorial. If the packages are archived in CRAN, use install.packages() to install them. For example, to install the metafor package (a common meta-analysis package), you can execute install.packages("metafor") in the console (bottom left pane of R Studio). To install packages that are not on CRAN and archived in Github repositories, execute devtools::install_github(). For example, to install orchaRd (a meta-analysis visualization package) from Github repository, execute devtools::install_github("daniel1noble/orchaRd", force = TRUE).

Key package list: metafor, clubSandwich,orchaRd, MuMIn, glmulti, mice, metagear,tidyverse, here, DT, readxl, stringr, GoodmanKruskal,ggplot2, plotly, ggthemr, cowplot,grDevices,grid, gridGraphics, pander, formatR.

These packages might take a little while to install - you may wish to grab a cup of coffee while you wait!

Load custom functions

We also provide some additional helper functions that are necessary for our worked examples. The most straightforward way to use these custom functions is to source them with:

source(here("custom_func","custom_func.R"))

Alternatively, paste the source code (included in the file custom_func.R) into the console, and hit “Enter” to let R “learn” these custom functions.

Estimating an effect size statistic

Background to effect size estimation

Meta-analyses express the outcome of multiple studies on a common scale, through the calculation of an ‘effect size’(\(z_j\) in the main text), representing the magnitude of a difference or the strength of a relationship. The effect size serves as the response variable or dependent variable in a meta-analytical model. Several metrics of effect size are available, and the choice of metric should depend on the scientific question under investigation. We discuss three categories of effect size metrics:

Single-group effects

A statistical summary from one group; e.g., proportion, mean, log standard deviation (lnSD), log coefficient of variation (lnCVR).

Comparative effects

An effect size that quantifies the magnitude of a difference in statistics between two groups. These include, for example, the standardised mean difference (SMD; well known estimators: Hedges’ g or Cohen’s d), log response ratio (lnRR; aka ratio of means), mean difference (MD), risk (proportion) difference (RD), log odds ratio (lnOR), log variability ratio (lnVR), log coefficient of variation ratio (lnCV).

Association statistics

Quantify the strength of association between two variables; e.g., Fisher’s z-transformation of correlation correlation coefficient, r (Zr).

The definitions, formulas and explanations of commonly used effect sizes (and their sampling variances, \(\nu_j\)) can be found in Table 3 in the main text.

Calculating effect sizes

For a chosen effect size, it’s point estimate (\(z_j\)) and sampling variance (\(\nu_j\)) can be calculated using function escalc() in metafor package.

Let’s load the dataset of our first worked example from [1], who meta-analysed the intraspecific change in seven morpho‐ecophysiological leaf traits along global elevational gradients.

dat_Midolo_2019 <- read.csv(here("data","Midolo_2019_Global Change Biology.csv"))

For simplicity, we will only keep the columns that are necessary to demonstrate the meta-analytic techniques of interest in this tutorial.

Table S1
The variables in the first worked example ([1]).

Table S1 includes the coded variables that were extracted from published papers.

You can find more details about the variables in the supplementary information of the paper by [1], Tables S1, if interested. Here we classify the variables into three categories types:

Bibliographic variables

Variables that distinguish between articles, such as the title (study_name), or publication year.

Descriptive statistics

Variables that are used to compute effect sizes, such as mean (treatment and control), standard deviation (sd_treatment and sd_control) and sample size (n_treatment and n_control).

Study-related characteristics

Study-level variables that can contribute to random variability among the effect sizes, such as study identity (Study_ID), or systematic variability between the effect sizes, such as treatment intensity, or study location characteristics such as (elevation); which may be of interest as a moderator variable.

After having an (primary) understanding of the extracted/coded variables, we can use escalc() to compute effect size (using lnRR as an example) with.

Note that the argument measure is used to specify the effect size metric that we will calculate. For example, by specifying measure = "SMD" , the function will compute the point estimate of SMD (specifically, Hedges’ g) and its sampling variance.

Here are the most commonly used effect sizes and underappreciated effect sizes. For other types, you can look at the help information of escalc by typing ?escalc in the console of R.

measure = "ROM": lnRR - quantifies the difference between the means of two groups
measure = "SMD": SMD - quantifies the differences between the means of two groups
measure = "ZCOR": Zr - quantifies the strength of association between two variables
measure = "VR": lnVR - quantifies the differences in variances between two groups
measure = "CVR": lnCVR - quantifies the differences in variances between two groups, accounting for the mean-variance relationship

Note that lnVR and lnCVR are variation/dispersion-based effect sizes, which have practical implications for environmental studies. For example, environmental stressors such as pesticides and eutrophication are likely to increase variability in biological systems because stress accentuates individual differences in environmental responses.

lnRR <- escalc(measure = "ROM",  # "ROM" means ratio of means; lnRR is specified to be calculated (alternative effect sizes: "SMD" – SMD, "CVR" – lnCVR, "VR" – lnVR; see below);
               m1i = treatment, # mean value of of group 1 (e.g., environmental stressor); in our worked example, m1i denotes the mean value of a trait measured at the higher elevation level;
               m2i = control, # mean value of group 2 (e.g., control); in our worked example, m2i denotes the mean of the same trait measured at the lower elevation level;
               sd1i = sd_treatment, # standard deviation of mean of group 1 (e.g., environmental stressor)
               sd2i = sd_control, # standard deviation of group 2 (e.g., control) 
               n1i = n_treatment, # sample size of group 1 (e.g., environmental stressor) 
               n2i = n_control, # sample size of group 2 (e.g., control) 
               data = dat_Midolo_2019, # dataset containing the above information (here is the dataset of our working example)
               )

Let’s explore the effect size estimates and their sampling variances.

Table S2

The point estimate of effect size (lnRR) and their sampling variance (lnRRV) for each included study and comparison (higher elevation vs. lower elevation).

Choosing a meta-analytic model

Two traditional models

After computing the effect sizes and sampling variance, we can address the first aim of a meta-analysis - estimating overall effect.

Two traditional meta-analytic models can be easily fitted via function rma() in metafor. These are are rarely appropriate, and we usually need a more complex model, called a multilevel meta-analytic model. However, we fit the traditional models here for comparison.

Fixed-effect model (‘common-effect’ or ‘equal-effect’ model; Equation 1)

\[ z_{j} = \beta_{0} + m_{j}, (1)\\ m_{j} \sim N(0,\nu_{j}) \] All notations can be found in the main text. The corresponding R code is:

mod_FE_lnRR <- rma(yi = lnRR, # calculated/estimated effect size are supplied; the outputs of escalc() function; 
                   vi = lnRRV, # calculated/estimated sampling variance of lnRR are supplied; 
                   method = "EE", # fixed-effect model is specified;
                   data = dat2_Midolo_2019 # our dataset

Below is the output of our fitted fixed-effect model. Model Results, which address the first aim of an environmental meta-analysis: to estimate an overall mean effect size, including the magnitude (\(\beta_0\)), uncertainty (standard error \(SE[\beta_0]\)), significance test, and confidence intervals (CIs). We will teach you how to properly interpret all the elements of the above printed model results when we get to the next session - the multilevel meta-analytic model.


Equal-Effects Model (k = 1294)

     logLik     deviance          AIC          BIC         AICc   
-18108.6895   41267.0300   36219.3790   36224.5445   36219.3821   

I^2 (total heterogeneity / total variability):   96.87%
H^2 (total variability / sampling variability):  31.92

Test for Heterogeneity:
Q(df = 1293) = 41267.0300, p-val < .0001

Model Results:

estimate      se      zval    pval    ci.lb    ci.ub      
 -0.0144  0.0004  -34.0926  <.0001  -0.0152  -0.0135  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

random-effects model (Equation 2)

\[ z_{j} = \beta_{0} + \mu_{j} + m_{j}, (2)\\ \mu_{j} \sim N(0,\tau^2), m_{j} \sim N(0,\nu_{j}) \] All notations can be found in the main text. Equation 2 can be fitted via rma with code:

Below are the outputs of our fitted random-effects model. Again, we will elaborate on the interpretation in later section.


Equal-Effects Model (k = 1294)

     logLik     deviance          AIC          BIC         AICc   
-18108.6895   41267.0300   36219.3790   36224.5445   36219.3821   

I^2 (total heterogeneity / total variability):   96.87%
H^2 (total variability / sampling variability):  31.92

Test for Heterogeneity:
Q(df = 1293) = 41267.0300, p-val < .0001

Model Results:

estimate      se      zval    pval    ci.lb    ci.ub      
 -0.0144  0.0004  -34.0926  <.0001  -0.0152  -0.0135  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It should be noted by fitting the above two traditional models, we assume that the number of independent effect sizes k = 1294 effect sizes. This is wrong, as some studies contributed more than one effect size, which are not independent. We will elaborate on this in the next sections!

Technical Recommendation

Setting method = "REML" fits a random-effects model using the restricted maximum likelihood estimator to calculate the maximum likelihood of heterogeneity and corresponding model coefficients (e.g., \(\beta_0\)). We can also fit a random-effects model with the above syntax but setting the method argument to different estimators for estimating variance components: method = "PM": Paule-Mandel method; method = "ML": maximum likelihood estimator; method="DL" - DerSimonian-Laird estimator; method = "HE": Hedges estimator. Generally, we recommend REML as it can produce (approximately) unbiased estimates of variance components (e.g., \(\tau^2\)) according to various simulation works. But note that the Paule-Mandel method has superior properties for common effect size measures, such as SMD, in the context of random-effects model. Paule-Mandel method is recommended when fitting traditional random-effects models. However, the Paule-Mandel method does not generalize straightforwardly to multilevel models. Put differently, Paule-Mandel method is not feasible in rma.mv function.

Multilevel meta-analytic model

Brief theory

As we explain in the main text, a common statistical issue confronted by environmental meta-analyst is that of non-independence among data points (i.e., effect sizes). This is because each study (article) often contributes more than one effect sizes. To handle the non-independent effect sizes, the simplest meta-analytic model for environmental sciences is the multilevel model:

\[ z_{i} = \beta_{0} + \mu_{j[i]} + e_{i} + m_{i}, (3)\\ \mu_{j[i]} \sim N(0,\tau^2), e_{i} \sim N(0,\sigma^2), m_{i} \sim N(0,\nu_{i}) \] The dataset of [1] (Table S1) has N = 71 primary studies with k = 1294 effect sizes. k/N = 18 indicating statistical dependence. This is because, on average, 18 effect sizes are contributed by each primary study. See Figure 2 in the main text for more situations of statistical non-independence in environmental meta-analysis. The Equation 3 uses an additional random-effects term, within-study effect \(e_{i}\) (effect-size level), to account for non-independence due to multiple effect sizes contributed by studies (i.e., clustering).

Model fitting

We need to handle non-independence from the very beginning of our analysis. When preparing our data file (e.g., .CSV file), we need to structure our data in a way that permits the incorporation of non-independence among effect sizes. To do so, we need to code a unique identifier for each primary study (Study_ID: S1, S2, S3, …) and a unique identifier for each effect size (ES_ID: ES1, ES2, ES3, …).

With these variables coded, we can use function rma.mv (rather than rma) to fit the multilevel meta-analytic model (Equation 3). Supplying Study_ID and ES_ID to the argument random, we can specify the two random-effects terms in Equation 3, including: the study level \(\mu_{j[i]}\) (between-study effect) and the effect size \(e_{i}\) (within-study effect).

We use a formula to properly define the random-effects structure specified in random: starts with ~ 1, followed by a |, then a identifier denoting random effects (e.g., Study_ID or ES_ID). The complete syntax to fit Equation 3 is:

mod_ML_lnRR <- rma.mv(yi = lnRR, 
                      V = lnRRV, 
                      random = list(~1 | Study_ID, # allows true effect sizes to vary among different primary studies - account for the between-study effect and quantify between-study heterogeneity;
                                    ~1 | ES_ID), # allows true effect sizes to vary within primary studies - account for the with-study effect and quantify with-study heterogeneity;
                      method = "REML", # REML is assigned as the estimator for variance components as suggested;
                      data = dat2_Midolo_2019 # our dataset
                      )

Alternatively, you can use Study_ID/ES_ID to specify random argument, which is a more straightforward way of showing nesting structure:

mod_ML_lnRR <- rma.mv(yi = lnRR, 
                      V = lnRRV, 
                      random = ~1 | Study_ID/ES_ID, # alternative syntax to specify random structure;
                      method = "REML", # REML is assigned as the estimator for variance components as suggested;
                      data = dat2_Midolo_2019 # our dataset
                      )

Results interpretation

The classic results of the fitted multilevel model (Equation 3) look like:


Multivariate Meta-Analysis Model (k = 1294; method: REML)

   logLik   Deviance        AIC        BIC       AICc   
-138.1757   276.3514   282.3514   297.8456   282.3700   

Variance Components:

            estim    sqrt  nlvls  fixed    factor 
sigma^2.1  0.0191  0.1381     71     no  Study_ID 
sigma^2.2  0.0515  0.2269   1294     no     ES_ID 

Test for Heterogeneity:
Q(df = 1293) = 41267.0300, p-val < .0001

Model Results:

estimate      se    zval    pval    ci.lb   ci.ub    
  0.0297  0.0194  1.5319  0.1255  -0.0083  0.0678    

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let’s go through the results in order of appearance.

Multivariate Meta-Analysis Model details

We are first reminded of the model we have fitted:

k = the number of effect sizes (k) fed to the multilevel model.

method: REML = the REML method was specified as estimation procedure for model fitting to obtain model estimates (e.g., variance components, model coefficients).

Fit statistics

The next line contains several fit statistics, which can be used for model comparison), including:

logLik = restricted log-likelihood of the fitted model,

Deviance = goodness-of-fit statistic of the fitted model,

AIC = Akaike information criterion score of the fitted model,

BIC = Bayesian information criterion, and

AICc = AIC corrected for small sample sizes.

Variance Components

sigma^2.1 = between-study variance \(\tau^2\). In this example, this is conceptually equivalent to between-study heterogeneity variance \(\tau^2\) in a random-effects model (formulated as Equation 2).

sigma^2.2 = within-study variance \(\sigma^2\)

estim = the estimated amount of corresponding variance components

sqrt = standard deviation of variance components

nlvls = the number of levels for each random effect

factor = the name of the variables we supply to argument random to specify the random effects.

Test for Heterogeneity

Results of the heterogeneity test, which tests whether there are effect size differences in the data (based on Cochran’s Q-test, which is used to test the null hypothesis that all experimental studies have the same/equal effect).

p-val < 0.05 = effect sizes derived from the environmental studies are heterogeneous. It is also good to know how much of the heterogeneity is due to differences within studies, and how much is due to between-study differences.

Model Results

Results of model coefficients and corresponding model inference.

estimate = the estimate of pooled/average/overall effect; also as known as grand mean or meta-analytic effect size (\(\beta_{0}\) in Equation 3.

se = the standard error of the estimate; here, it is (SE[\(\beta_{0}\)].

zval = the value of test statistic (in our case: z-value; see Technical recommendation for our recommended practices).

ci.lb = lower boundary of (95% by default) confidence intervals (CIs).

ci.Ub = upper boundary of CIs.

Technical recommendation

The argument test in functions rma.mv and rma is used to specify the methods to calculate test statistics and perform significance tests of model coefficient like \(\beta_0\) (confidence intervals and p-value). By default, rma.mv and rma set test = "z", which uses a test that assumes a normal distribution. However, when meta-analysing small number of studies, setting test = "z" will lead to a nominally high Type I error rate, leading to high false positive results. To achieve a nominal performance of tests, it is highly recommended to set test = "t", which uses a t-distribution with k−p degrees of freedom to test model coefficients (e.g., \(\beta_0\)) and CIs (p = the number of model coefficients). There are also other possible adjustments, for example, adjusting the degrees of freedom of the t-distribution, which can be implemented by setting dfs = "contain".

Therefore, the a multilevel model with improved model inference methods can be fitted with:

mod_ML_lnRR <- rma.mv(yi = lnRR, 
                      V = lnRRV, 
                      random = list(~1 | Study_ID, # allows true effect sizes to vary among different primary studies - account for the between-study effect and quantify between-study heterogeneity;
                                    ~1 | ES_ID), # allows true effect sizes to vary within primary studies - account for the with-study effect and quantify with-study heterogeneity;
                      method = "REML", # REML is assigned as the estimator for variance components as suggested;
                      test = "t", # t-distribution is specified for the tests of model coefficients and CIs;
                      dfs = "contain", # the methods to adjust the (denominator) degrees of freedom;
                      data = dat2_Midolo_2019 # our dataset
                      )

Corresponding results look:

As you can see, under Model Results, zval turns to tval. Note that the value has not changed because this worked example has a large number of primary studies (N = 71) that a z-distribution (standard normal distribution) is approximately the same as a t-distribution. When the number of primary studies (N) is low, setting test = "t" and dfs = "contain" is highly recommended.

Statistical non-independence

Undesirable consequences

As we explain in the main text, failing to deal with statistical non-independence (details see Figure 2) will inflate the Type 1 error, leading to an underestimated standard error of \(\beta_0\) and a spurious p-value of \(\beta_0\) (our first example shows this problem; see below). Note that the estimate of the magnitude of \(\beta_0\) is not necessarily biased. However, the bias can be large and even cause a change in effect size sign/direction (as is the case with our second worked example; see below section).

Below, we compare the results of traditional models (fixed and random effects models) and the more appropriate multilevel meta-analytic model. We show how ignoring statistical non-independence using the traditional models distort the the meta-analytic evidence. Let’s look at Table S3 showing results of traditional models and the multilevel model.

Table S3
Comparison of the random-effectss and multilevel meta-analytic models.

Biased results by ignoring statistical non-independence

From Table S3, we can see that conclusions based on our proposed multilevel model conflict with those reached by random-effects model; p-value and 95% CIs demonstrate that the multilevel model shows statistically non-significant overall effect (\(\beta_{0}\)), while random-effectss model shows statistically significant overall effect (\(\beta_{0}\))。

This indicates that using the traditional model to fit non-independent effect sizes underestimates the standard error of model coefficient (SE[\(\beta_{0}\)]) and distorts corresponding statistical inference (e.g., by inflating the p-value): SE[\(\beta_{0}\)] = 0.019 in the multilevel model vs. SE[\(\beta_{0}\)] = 0.008 in the random-effects model; p-value = 0.1255 in the multilevel model vs. p-value = 2^{-4} in the random-effects model. The width of 95% CIs in the multilevel model are wider than those in the random-effects model: = [-0.008 to 0.068] in the multilevel model vs. [0.014 to 0.068] in the random-effects model. Therefore, if using random-effects model fitting this data ([1]), we might have wrongly concluded that increasing elevation has, on average, no effect on intraspecific leaf traits.

The multilevel model accounts for statistical non-independence due to multiple effect sizes derived from the same study. We can quantify the degree of dependence using an index called the intraclass correlation coefficient:

\[ ICC = \frac {\tau^2} {\tau^ + \sigma^2} \]

The ICC index can reflect the intraclass correlation of the true effects within the same study. In the above fitted model, the ICC can be computed with:

mod_ML_lnRR$sigma2[1] / sum(mod_ML_lnRR$sigma2)

[1] 0.2703379

An ICC value of 0.2703 indicates that the underlying true effects (quantified as lnRR in this case) are weakly correlated within studies. However, the traditional models ignores this dependence and regards all true effects within the same study as independent (which would give ICC = 0). Therefore, the meta-analytic conclusions have changed qualitatively (turning from statistically significant into non-significant - inflated Type I error rates).

Mis-estimation of heterogeneity statistics

From Table S3, we can also see that random-effects model leads to neglected results with respect to variance and heterogeneity between effect sizes. This is because random-effects model is likely to exaggerate between-study variance (\(\tau^2\)). The value of \(\tau^2\) in random-effects model (0.065) is three times that in multilevel model (0.019). The random-effects model has wrongly distributed within-study variance (\(\sigma^2\)) to between-study variance (\(\tau^2\)). Therefore, fitting this dataset using random-effects model will result in a wrong conclusion that there is a high amount of between-study heterogeneity. This overlooks the fact that there is a large variance and heterogeneity within study (\(\sigma^2\) = 0.052 and \(I^2_{effect}\) = 72.73; see the calculation of \(I^2_{effect}\) in the section Quantifying & explaining heterogeneity).

Variance-covariance matrix

As explained in the main text, when multiple effect sizes are contributed by a single article, there will be two broad types of non-independence (see Figure 2 for detailed cases):

non-independence among effect sizes
non-independence among sampling errors

The multilevel model we proposed (as in Equation 3) only deals with cases of non-independence among effect sizes (\(z_{i}\)). For non-independence among sampling variances (\(\nu_{i}\)), for example, due to shared control statistics, we can construct a variance-covariance matrix to explicitly capture the non-zero covariance (\(Cov[\nu_{1[1]},\nu_{1[2]}]=\rho_{12}\sqrt{\nu_{1[1]}\nu_{1[2]}}\)) that arises from correlation between sampling errors (\(\nu_{i}\)) within the same primary studies:

\[ \boldsymbol{M} = \begin{bmatrix} \nu_{1[1]} & \rho_{12}\sqrt{\nu_{1[1]}\nu_{1[2]}} & 0 \\ \rho_{21}\sqrt{\nu_{1[2]}\nu_{1[1]}} & \nu_{1[2]} & 0 \\ 0 & 0 & \nu_{2[3]} \end{bmatrix}, (4) \] Constructing a VCV matrix is almost impossible because the within-study correlation (e.g., \(\rho_{12}\)) is rarely available from primary environmental studies, although exact covariances can be computed from some comparative effect statistics (see Table S2). But we can impute a VCV matrix by assuming a constant \(\rho\) across different environmental studies (\(\rho_{ik}=\cdots=\rho_{jh}\equiv\rho\)). With this simple assumption, we can impute VCV matrix via function impute_covariance_matrix() from the package clubSandwich:

VCV <- impute_covariance_matrix(vi = dat2_Midolo_2019$lnRRV, # sampling variances that are correlated with each within the same study;
                                cluster = dat2_Midolo_2019$Study_ID, # study identity - clustering variable;
                                r = 0.5 # assuming that the effect sizes within the same study are correlated with rho = 0.5.
                                )

This can also be done via function vcalc() in metafor:

VCV <- vcalc(vi = lnRRV, # sampling variances that are correlated with each within the same study;
             cluster = Study_ID,  # study identity - clustering variable;
             obs = ES_ID, # different effect sizes corresponding to the same response/dependent variable
             data = dat2_Midolo_2019, 
             rho = 0.5 # assuming that the effect sizes within the same study are correlated with rho = 0.5
             )

Setting rho = 0.5 means we assume that sampling errors within the same study are correlated with \(\rho\) = 0.5 (see below for Technical Recommendation). Let’s see what the imputed VCV matrix looks like. Let’s use the Bansal and Germino (2010) study as an example:

      [,1]  [,2]  [,3]  [,4]
[1,] 0.023 0.009 0.009 0.010
[2,] 0.009 0.013 0.007 0.007
[3,] 0.009 0.007 0.015 0.008
[4,] 0.010 0.007 0.008 0.017

As we can see, the VCV matrix is a symmetric matrix, with the sampling variance (\(\nu_{[i]}\)) on the diagonal, and the covariance (\(Cov[\nu_{i},\nu_{k}]\)) on the off-diagonal.

Let’s account for both non-independence among effect sizes and sampling errors:

mod_ML_lnRR_var <- rma.mv(yi = lnRR, 
                          V = VCV, # use covariance for comparison with that fitted by sampling covariance
                          random = list(~1 | Study_ID, 
                                        ~1 | ES_ID), 
                          method = "REML", 
                          test = "t", 
                          data = dat2_Midolo_2019
                          )

We can explore whether the model coefficients and corresponding significance tests change:


Multivariate Meta-Analysis Model (k = 1294; method: REML)

   logLik   Deviance        AIC        BIC       AICc   
-107.3517   214.7034   220.7034   236.1976   220.7220   

Variance Components:

            estim    sqrt  nlvls  fixed    factor 
sigma^2.1  0.0114  0.1066     71     no  Study_ID 
sigma^2.2  0.0516  0.2272   1294     no     ES_ID 

Test for Heterogeneity:
Q(df = 1293) = 65913.4258, p-val < .0001

Model Results:

estimate      se    tval    df    pval    ci.lb   ci.ub    
  0.0268  0.0170  1.5751  1293  0.1155  -0.0066  0.0602    

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We see that the overall effect (\(\beta_0\) = 0.03, 95% CIs = [-0.01 to 0.06]) and corresponding significance tests (p-value = 0.1155) do not change after accounting for non-independence among sampling errors. But this does not mean we do not need to account for it in general.

Technical Recommendation

Certain values of \(\rho\) have been recommended by several published papers, for example, rho = 0.5 or more conservatively, rho = 0.8. We should check the robustness of our study results against different values of \(\rho\) (see Conducting sensitivity analysis & critical appraisal for a pipeline conducting such a sensitivity).

Robust variance estimation

Robust variance estimation (RVE) provides a way to include dependent effect sizes in meta-regression, even when the nature of the dependence structure is unknown. RVE methods do not require knowledge of the exact dependence structure between effect size estimates, and instead, RVE approximates the dependence structure. We recommend that RVE is applied within the framework of the multilevel model to make robust inference and to partition variance components to provide more insights into heterogeneity.

It is very straightforward to implement the combination of RVE and multilevel model. We can supply the fitted multilevel model (rma.mv object) to function coef_test() function from the clubSandwich package:

mod_ML_RVE <- coef_test(mod_ML_lnRR_var, # fitted multilevel model for which to make robust model inference (an object of class "rma.mv" can be directly supplied to coef_test());
                       vcov = "CR2", # ‘bias-reduced linearization’ is specified to approximate variance-covariance;
                       cluster = dat2_Midolo_2019$Study_ID # study identity -clusting variable
                       )

RVE computes the robust error and use it for the subsequent testing of null hypothesis (t-stat, p-val) and confidence intervals of the model coefficients (\(\beta_{0}\)):

    Coef. Estimate     SE t-stat d.f. p-val (Satt) Sig.
1 intrcpt   0.0297 0.0194   1.53 64.7         0.13

Alternatively, we can use function robust() in metafor to implement RVE:

robust(mod_ML_lnRR_var, 
       cluster = dat2_Midolo_2019$Study_ID, 
       clubSandwich = TRUE)


Multivariate Meta-Analysis Model (k = 1294; method: REML)

Variance Components:

            estim    sqrt  nlvls  fixed    factor 
sigma^2.1  0.0191  0.1381     71     no  Study_ID 
sigma^2.2  0.0515  0.2269   1294     no     ES_ID 

Test for Heterogeneity:
Q(df = 1293) = 41267.0300, p-val < .0001

Number of estimates:   1294
Number of clusters:    71
Estimates per cluster: 2-90 (mean: 18.23, median: 10)

Model Results:

estimate      se1    tval1     df1    pval1    ci.lb1   ci.ub1    
  0.0297  0.0194   1.5327   64.72   0.1302   -0.0090   0.0685     

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

1) results based on cluster-robust inference (var-cov estimator: CR2,
   approx t-test and confidence interval, df: Satterthwaite approx)

We can see the results of RVE from function coef_test() are same as those from function robust().

Technical Recommendation

We do not yet have a definite recommendation on which method to use to account for non-independence among sampling errors (using the VCV matrix or RVE). This is because no simulation work in the context of multilevel meta-analysis has been done so far.

Quantifying & explaining heterogeneity

Measuring heterogeneity

The second aim of a meta-analysis is to quantify consistencies (heterogeneity) between studies. In our main text, we recommend two ways of answering this question: using absolute and relative measures of heterogeneity.

Absolute heterogeneity measure

We can use variance components such as \(\tau^2\), which can be directly extracted from the outputs of the fitted model (see section Results interpretation); \(\tau^2\) in a random-effects model (or \(\tau^2\) + \(\sigma^2\) in a multilevel model) directly reflects the differences underlying the true effects corresponding to each random effect because the square root can be interpreted as the standard deviation of the true effect sizes. Also, an often neglected insight here is that the sum of the \(\tau^2\) and \(\sigma^2\) denotes the total amount of variation or heterogeneity in the true effects. In this worked example ([1]), the variation in the true effects (\(\tau^2 + \sigma^2\)) is 0.0706, which is quite large given the magnitude of the overall effect (\(\beta_0\) = 0.0297). Put differently, true effects’ heterogeneity is more than twice the magnitude of true effects (CV = 2.37)

Relative heterogeneity measure

We can use \(I^2\) statistic to express variance due to differences between studies (not due to sampling variance) in the case of random-effects model (Equation 2):

\[ I^2=\frac{\tau^2} {\tau^2+\overline{\nu}}, (5) \] \[ \overline{\nu}=\frac{(N_{effect}-1)\sum_{j=1}^{k} 1/\nu_{i}} {(\sum_{j=1}^{k} 1/\nu_{i})^2-\sum_{j=1}^{k} 1/\nu_{i}^2}, (6) \]

In the case of multilevel model (Equation 3), the formulas can be written as:

\[ I^2_{total}=\frac{\tau^2+\sigma^2} {\tau^2+\sigma^2+\overline{\nu}}, (7) \] \(I^2_{total}\) can be further decomposed into variance due to differences between studies (\(I^2_{study}\)) and variance due to differences within studies (\(I^2_{effect}\)):

\[ I^2_{study}=\frac{\tau^2} {\tau^2+\sigma^2+\overline{\nu}}, (8) \] \[ I^2_{effect}=\frac{\sigma^2} {\tau^2+\sigma^2+\overline{\nu}}, (9) \] We have developed a function i2_ml() (orchaRd package) to compute Equations 7 to 9:

i2_ml(mod_ML_lnRR_VCV)

   I2_Total I2_Study_ID    I2_ES_ID 
   99.63261    17.97605    81.65656

We see there is a small between-study heterogeneity (indicated by \(I^2_{study}\)), while a large within-study heterogeneity (indicated by \(I^2_{effect}\)). If we use a random-effects model to quantify heterogeneity, we might have concluded that the high heterogeneity is due to between-study differences (i.e. \(\tau^2\) and \(I^2\); see Table S3). There is an argument boot that allows us to compute the (percentile) CIs of \(I^2\). Note that calculating CIs of \(I^2\) may take a long time to run because i2_ml() uses the (parametric) bootstraping method.

Explaining variance with meta-regression

Continuous moderator

The third aim of a meta-analysis is to explain the heterogeneity. To answer this question, it is recommended to use meta-regression models rather than subgroup analysis (where one divides the dataset according to one predictor variable [sex: male vs. female] and conducting separate meta-analyses for each subset). In the context of meta-analysis, we often name a predictor variable as a moderator, indicating that the magnitude of overall effect may be modified systematically as a function of the moderator.

Building upon the multilevel model (Equation 3), a meta-regression model can be constructed by adding one moderator variable (AKA predictor, independent/explanatory variable, or fixed factor):

\[ z_{i} = \beta_{0} + \beta_{1}x_{1j[i]} + \mu_{j[i]} + e_{i} + m_{i}, (10)\\ \mu_{j[i]} \sim N(0,\tau^2), e_{i} \sim N(0,\sigma^2), m_{i} \sim N(0,\nu_{i}) \]

Model coefficient \(\beta_1\), denotes the slope of the moderator variable \(x_{1}\), and how it affects the magnitude of the overall effect.

If we are interested in the effects of more than one moderator, we can put the moderator variables into a single meta-regression model, leading to multiple meta-regression (note that post-hoc hypotheses are generally not recommended, so you need to have a clear hypothesis about each variable, rather adopting a data-driven strategy):

\[ z_{i} = \beta_{0} + \sum \beta_{h}x_{h[i]} + \mu_{j[i]} + e_{i} + m_{i}, (11)\\ \mu_{j[i]} \sim N(0,\tau^2), e_{i} \sim N(0,\sigma^2), m_{i} \sim N(0,\nu_{i}) \]

We can fit a (multiple) regression model (as formulated as Equations 10 and 11) using function rma.mv(). The moderator, \(x_{1}\), can be supplied to the argument mods using following formula: starting with a tilde ~, followed by the name of the moderator (e.g., mods = ~ x_1).

In our worked example, [1] examined whether intraspecific leaf trait variation (measured as lnRR) in response to difference in elevation. To answer this question, we can construct a meta-regression model with difference in elevation as a moderator, which is coded as elevation column in the dataset (we log-transformed the value of elevation for each study to achieve normality assumption and labelled it as elevation_log). Setting mods = ~ elevation_log can fit such a meta-regression model with:

mod_MLMR_lnRR_elevation <- rma.mv(yi = lnRR, 
                                  V = VCV, 
                                  mods = ~ elevation_log, # adding elevation as a moderator variable (for continuous variable, it is a good practice to log-transform them to avoid data skewness);
                                  random = list(~1 | Study_ID, 
                                                ~1 | ES_ID), 
                                  method = "REML", 
                                  test = "t", 
                                  data = dat2_Midolo_2019,
                                  sparse = TRUE
                                  )

The results of the above fitted meta-regression model are very similar with those of a multilevel meta-analytic model (as shown in Results interpretation in Multilevel meta-analytic model):


Multivariate Meta-Analysis Model (k = 1294; method: REML)

   logLik   Deviance        AIC        BIC       AICc   
-103.6834   207.3669   215.3669   236.0227   215.3979   

Variance Components:

            estim    sqrt  nlvls  fixed    factor 
sigma^2.1  0.0098  0.0991     71     no  Study_ID 
sigma^2.2  0.0515  0.2269   1294     no     ES_ID 

Test for Residual Heterogeneity:
QE(df = 1292) = 65188.5301, p-val < .0001

Test of Moderators (coefficient 2):
F(df1 = 1, df2 = 1292) = 8.0348, p-val = 0.0047

Model Results:

               estimate      se     tval    df    pval    ci.lb    ci.ub     
intrcpt         -0.1507  0.0647  -2.3290  1292  0.0200  -0.2776  -0.0238   * 
elevation_log    0.0290  0.0102   2.8346  1292  0.0047   0.0089   0.0491  ** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here, we only discuss results that are different from those from the multilevel meta-analytic model.

Test for Residual Heterogeneity

QE = test statistic used to test whether there is a large amount of “residual heterogeneity” among effect sizes. “Residual heterogeneity” means the amount of heterogeneity that is not explained by the included moderator (here is elevation).

p-val < 0.0001 = residual heterogeneity is still substantial (which is statistically significant larger than that of sampling variance).

Test of Moderators (coefficient 2)

This section denotes omnibus test of all model coefficients or joint test of the null hypotheses:

\[ H_0:\beta_0=\beta_1=0 \] coefficient 2 = 2 coefficients are tested. In our case, intercept (\(\beta_0\)) and slope (\(\beta_1\)) of \(x_1\) (i.e., elevation).

F(df1 = 1, df2 = 1292) = the omnibus test is based on F-distribution with m (2) and k (1294) - p (2) (degrees of freedom with m representing the number of model coefficients tested and p the total number of model coefficients). We recommend to use a F-distribution rather than a chi-square distribution to improve the inference performance. To do so, set test = "t" rather than test = "z" (default of rma.mv()).

p-val = 0.0047 = p-value of this test, indicating that we can reject the null hypothesis that elevation has a null effect on leaf traits.

Model Results

intrcpt = intercept \(\beta_0\) in Equation 10. Note that this \(\beta_0\) is distinct from \(\beta_0\) (i.e., overall effect) in Equations 1 - 3. \(\beta_0\) here means the average effect size (lnRR) with a elevation of 0. To put it in another way, setting \(x_1\) = 0 can get intercept \(\beta_0\).

elevation_log = slope \(\beta_1\) of \(x_1\) (in this case elevation), denoting the estimated effect of the tested moderator \(x_1\).

With respect to the hypothesis tested in [1], meta-regression confirms that elevation has a positive relationship with intraspecific leaf trait (quantified as lnRR).

Categorical moderator

As explained in our main text, a categorical moderator also can be incorporated into a meta-regression by creating ‘dummy’ variables representing the various levels/subgroups (see main text for an example). R can create dummy variables for us automatically. Therefore, parameterizing a meta-regression with a categorical moderator is the same as meta-regression with a continuous moderator, with one exception: we can decide whether to keep or remove intercept (see details below).

Let’ say we aim to investigate whether the effect of elevation differs depending on leaf trait types. This variable is coded as trait in the dataset, including 7 levels: specific leaf area (SLA), leaf mass per area (LMA), leaf area (LA), nitrogen concentration per unit of area (Narea), nitrogen concentration per unit mass (Nmass), phosphorous concentration per unit mass (Pmass) and carbon isotope composition (d13C).

Traditionally, we would divide the dataset into 7 subgroups and subsequently conduct 7 separate meta-analyses. A more powerful method is to fit a meta-regression with this categorical moderator (trait):

mod_MLMR_lnRR_trait <- rma.mv(yi = lnRR, 
                              V = VCV, 
                              mods = ~ trait -1, # setting '-1' is a useful strategy to remove intercept and then directly obtain the estimated effect (slope) for each subgroup or for a particular level within the categorical moderator.
                              random = list(~1 | Study_ID, 
                                                ~1 | ES_ID), 
                              method = "REML", 
                              test = "t", 
                              data = dat2_Midolo_2019
                              )

The corresponding model output is then:


Multivariate Meta-Analysis Model (k = 1294; method: REML)

  logLik  Deviance       AIC       BIC      AICc   
-71.6727  143.3454  161.3454  207.7860  161.4863   

Variance Components:

            estim    sqrt  nlvls  fixed    factor 
sigma^2.1  0.0112  0.1058     71     no  Study_ID 
sigma^2.2  0.0486  0.2205   1294     no     ES_ID 

Test for Residual Heterogeneity:
QE(df = 1287) = 63967.9953, p-val < .0001

Test of Moderators (coefficients 1:7):
F(df1 = 7, df2 = 1287) = 11.4776, p-val < .0001

Model Results:

               estimate      se     tval    df    pval    ci.lb    ci.ub      
I(trait)dC13    -0.0136  0.0243  -0.5596  1287  0.5758  -0.0613   0.0341      
I(trait)LA      -0.0672  0.0275  -2.4493  1287  0.0144  -0.1211  -0.0134    * 
I(trait)LMA      0.0617  0.0251   2.4566  1287  0.0142   0.0124   0.1109    * 
I(trait)Narea    0.0820  0.0243   3.3816  1287  0.0007   0.0344   0.1296  *** 
I(trait)Nmass    0.0842  0.0214   3.9409  1287  <.0001   0.0423   0.1261  *** 
I(trait)Pmass    0.1633  0.0332   4.9155  1287  <.0001   0.0981   0.2285  *** 
I(trait)SLA     -0.0258  0.0241  -1.0702  1287  0.2848  -0.0732   0.0215      

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now, results under Test of Moderators are joint test of null hypotheses for slopes of 7 dummy variables (\(\beta_1\) to \(\beta_7\); no intercept \(\beta_0\) because we remove it via -1):

\[ H_0:\beta_1=\beta_2=\beta_3=\beta_4=\beta_5=\beta_6=\beta_7=0 \]

p-val < .0001 means the null hypotheses are rejected, suggesting that the trait types as a whole indeed affect the average effect of elevation (there is at least one subgroup showing a statistically significant effect).

The results printed under Model Results directly give the estimated effect for each of 7 traits because removing intercept from model (by '-1') can make all dummy variables be incorporated in the model as moderators. We see that 5 traits (LA, LMA, Narea, Nmass and Pmass) have statistically significant effects, while the other 2 traits (dC13 and SLA) do not.

Technical Recommendation

Setting mods = ~ x (in this case, mods = ~ trait) will keep intercept (\(\beta_0\)) in the model:

mod_MLMR_lnRR_trait2 <- rma.mv(yi = lnRR, 
                              V = VCV, 
                              mods = ~ I(trait),
                              random = list(~1 | Study_ID, 
                                            ~1 | ES_ID), 
                              method = "REML", 
                              test = "t", 
                              data = dat2_Midolo_2019
                              )


Multivariate Meta-Analysis Model (k = 1294; method: REML)

  logLik  Deviance       AIC       BIC      AICc   
-71.6727  143.3454  161.3454  207.7860  161.4863   

Variance Components:

            estim    sqrt  nlvls  fixed    factor 
sigma^2.1  0.0112  0.1058     71     no  Study_ID 
sigma^2.2  0.0486  0.2205   1294     no     ES_ID 

Test for Residual Heterogeneity:
QE(df = 1287) = 63967.9953, p-val < .0001

Test of Moderators (coefficients 2:7):
F(df1 = 6, df2 = 1287) = 12.9754, p-val < .0001

Model Results:

               estimate      se     tval    df    pval    ci.lb   ci.ub      
intrcpt         -0.0136  0.0243  -0.5596  1287  0.5758  -0.0613  0.0341      
I(trait)LA      -0.0536  0.0304  -1.7666  1287  0.0775  -0.1132  0.0059    . 
I(trait)LMA      0.0753  0.0283   2.6639  1287  0.0078   0.0198  0.1307   ** 
I(trait)Narea    0.0956  0.0265   3.6063  1287  0.0003   0.0436  0.1477  *** 
I(trait)Nmass    0.0978  0.0242   4.0417  1287  <.0001   0.0503  0.1452  *** 
I(trait)Pmass    0.1769  0.0354   5.0013  1287  <.0001   0.1075  0.2463  *** 
I(trait)SLA     -0.0122  0.0285  -0.4285  1287  0.6684  -0.0681  0.0437      

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

By default, R alphabetizes the dummy variables. In our case, dC13 is set as reference level and associated dummy variable is taken out from the meta-regression model. Why dC13 rather than other dummy variables? Because the letter “d” (dC13) comes before other letters, for example, “L” (LA) and “S” (SLA). Therefore, the model intercept \(\beta_{0}\) (intrcpt) represents the estimated effect for subgroup of dC13 (\(\beta_0\) = -0.0136, 95CI% = [-0.0613 to 0.0341], p-value = 0.5758). The remaining model coefficients denote contrasts between the reference level (intrcpt or dC13) and the other 6 levels/groups. Therefore, removing the intercept (not adding -1) can allow us to test whether different levels/subgroups differ in terms of average effect. If we aim to compare the estimated effect of one level to other levels, we can set this level as reference by function relevel() or factor() prior to model fitting. Alternatively, we can use function anova() to obtain the contrasts. Note, if you use a subgroup analysis rather than the recommended meta-regression to explain heterogeneity, you have to do it by hand: using Wald-type test to calculate the test statistics and then performing significance tests.

Goodness-of-fit

The goodness-of-fit index \(R^2\) can be used to quantify the percentage of variance explained by a moderator. This index has implications of the importance of the examined moderator. It also can be used for model section to select a model with the ‘best’ set of moderators. A general and readily implementable form of \(R^2\) is the marginal version:

[2] propose to use a general form of \(R^2\) - marginal \(R^2\), which can be calculated as:

\[ R^2_{marginal}=\frac{f^2} {f^2+\tau^2+\sigma^2}, (12) \\ f^2 = Var(\sum \beta_{h}x_{h[i]}), (13) \] We wrote a function r2_ml() (in orchaRd package), which does the calculation of \(R^2_{marginal}\) :

r2_ml(mod_MLMR_lnRR_elevation)

The result given under R2_marginal shows that elevation can explain 0.98% of the variation between effect sizes.

   R2_marginal R2_conditional 
   0.009803104    0.168309094

Notes on visualisation and interpretation

Classic forest plots

For environmental meta-analyses, a forest plot is often used to visualize the distribution of effect sizes and the 95% CIs for each study, as well as the overall effects based on model (e.g., \(beta_0\) and its 95% CIs). A classic forest (Figure S1) can be made by function forest() in metafor package:

forest(mod_ML_lnRR_VCV, # rma.mv object
       xlab = "Effect size lnRR")

On the bottom of the forest plot (Figure S1) made by forest(), there is a four-sided polygon (well known as the ‘diamond’) representing the overall effect based on the fitted model (in our case, mod_ML_lnRR_VCV - rma.mv object containing outputs of a multilevel model). The center of the diamond denotes the point estimate and the left/right edges correspond to lower and upper CI boundaries. However, we can not clearly see these different features because the number of studies is very large for this dataset (it gets a bit squashed). Recently, some new types of figures that build upon forest plots have been proposed, including the ‘caterpillar’ plot and ‘orchard’ plot. See Underappreciated plots.

Figure S1
An example of forest plot showing the effect sizes from each study (and their 95% CIs) and the overall effect based on model.

Underappreciated plots

Function orchard_plot() in orchaRd package can make a forest-like plot (termed as ‘orchard’ plot) that can accommodate large number of studies.

A basic orchard plot can be made with:

orchard_plot(mod_multilevel_SMD, 
             mod = "1", 
             xlab = "Standardised mean difference (SMD)", 
             group = "Study_ID",  k = TRUE, g = TRUE,
             data = dat2_Midolo_2019) + 
             scale_x_discrete(labels = c("Overall effect (meta-analytic lnRR)"))

Figure S2
Orchard plot (forest-like plot) showing the effect sizes from each study (and their 95% CIs), the overall effect and its 95% CIs and 95% prediction intervals.

We see that an orchard plot is more informative than a classic forest plot. For example, it shows the distribution of the effect sizes, the number of effect sizes (k) and the number of studies (the number in the bracket). An orchard plot not only can visualize the meta-analytic results, but also lets us realize something we are not bale to see from statistical results, such as influential data points and outliers that could threaten the robustness of our results. Further, an orchard plot not only show 95% CIs (thick whiskers in Figure S2) but also 95% prediction intervals (PIs; thin whiskers), through which we can visually check the heterogeneity among effect sizes. Because CIs only include the standard error of the overall effect \(\beta_0\) (\(SE[\beta_0]\)), which PIs include variance of random effects:

\[ \text{95%CI} = \beta_{0} \pm t_{df[\alpha=0.05]} SE[\beta_{0}^2], (14) \]

\[ \text{95%PI} = \beta_{0} \pm t_{df[\alpha=0.05]} \sqrt{\tau^2+\sigma^2+ SE[\beta_{0}^2]}, (15) \] If you want to know the detailed value of PIs, you can use predict:


   pred     se   ci.lb  ci.ub   pi.lb  pi.ub 
 0.0268 0.0170 -0.0066 0.0602 -0.4667 0.5203

Visualize meta-regression with a categorical moderator

A fabulous function of orchard_plot() is that it also can nicely show results based on meta-regression model (Figure S3). The results based on meta-regression with trait as a moderator can be visualized with:

orchard_plot(mod_MLMR_lnRR_trait, 
             mod = "trait", 
             xlab = "Effect size lnRR", 
             group = "Study_ID",  k = TRUE, g = TRUE, trunk.size = 1.5, 
             data = dat2_Midolo_2019) + 
             scale_x_discrete(labels = c("SLA","Pmass","Nmass","Narea","LMA","LA","dC13"))

Figure S3
Orchard plot (forest-like plot) showing the effect sizes from each study and the overall effect for each subgroup/levels of a given categorical moderator.

Visualize meta-regression with a continuous moderator

For a meta-regression with a continuous moderator, we can use bubble_plot() to visualize the results. Let’s visualize the meta-regression with elevation as the continuous moderator:

bubble_plot(mod_MLMR_lnRR_elevation, 
            mod = "elevation_log", 
            xlab = "Elevation (log-transformed)", ylab = "Effect size lnRR",
            group = "Study_ID",k = TRUE, g = TRUE,
            data = dat2_Midolo_2019, legend.pos = "top.left") +
            theme(axis.text.x = element_text(size = 12, colour = "black"),
                  axis.text.y = element_text(size = 12, colour = "black"),
                  axis.title.x = element_text(size = 12, colour = "black"),
                  plot.title = element_text(size = 12, colour = "black"))

Figure S4
Bubble plot showing the results of a meta-regression model: relationship between a continuous moderator (in our case, elevation) and effect size magnitude (lnRR).

Checking for publication bias and robustness

Detecting and correcting for publication bias

Detecting small study effect

The most well-known form of publication bias is the small study effect, where effect size values from a “small” studies, with low replication and therefore large uncertainty and low precision, show different, often larger, treatment effects than large studies. A straightforward way to detect small study effect is to add the uncertainty of effect size as a moderator, such that the relationship between effect size and its uncertainty can be quantified. We propose to formulate Egger’s regression (which is a classic method to detect the symmetry of a funnel plot) in the framework multilevel model to detect the small-study effect for dependent effect sizes:

\[ z_{i} = \beta_{0} + \beta_1\sqrt{\frac {1} {\tilde{n_i}}} + \mu_{j[i]} + e_{i} + m_{i}, (16) \] \[ z_{i} = \beta_{0} + \beta_1(\frac {1} {\tilde{n_i}}) + \mu_{j[i]} + e_{i} + m_{i}, (17) \]

Sampling error \(\sqrt{\nu_i}\) is a typical measure of effect size uncertainty \(z_{i}\). However, for some types of effect size, for example, SMD, \(z_{i}\) has a intrinsic relationship with its \(\nu_i\) (see Table S2). Therefore, \(\nu_i\) is not a valid moderator for detecting a small-study effect. In Equation 16, we use an adapted sampling error based on effective sample size \(\tilde{n}\) as the moderator. Let’s calculate \(\tilde{n} = \frac {n_{iC}n_{iT}} {n_{iC}+n_{iT}}\) for SMD in our example (see Table S2 for formulas for other effect sizes):

ess.var_cal <- function(dat){1/dat$n_control + 1/dat$n_treatment} # write a help function to calculate adapted sampling variance based on effective size based  - tilde n
dat2_Midolo_2019$ess.var <- ess.var_cal(dat2_Midolo_2019) # calculate tilde N
dat2_Midolo_2019$ess.se <- sqrt(dat2_Midolo_2019$ess.var) # calculate adapted sampling error based on effective size - tilde square root n

Then the Equation 16 can be fitted with:

mod_MLMR_lnRR_ess.se <- rma.mv(yi = lnRR, 
                               V = VCV, 
                               mods = ~ ess.se, # add adjusted based sampling error - tilde square root n as a moderator to test small study effect. 
                               random = list(~1 | Study_ID, 
                                             ~1 | ES_ID), 
                               method = "REML", 
                               test = "t", 
                               data = dat2_Midolo_2019
                               )

The outputs of the above fitted model are exactly the same as those of a meta-regression model:


Multivariate Meta-Analysis Model (k = 1294; method: REML)

   logLik   Deviance        AIC        BIC       AICc   
-106.2100   212.4201   220.4201   241.0758   220.4511   

Variance Components:

            estim    sqrt  nlvls  fixed    factor 
sigma^2.1  0.0112  0.1060     71     no  Study_ID 
sigma^2.2  0.0516  0.2272   1294     no     ES_ID 

Test for Residual Heterogeneity:
QE(df = 1292) = 65891.3832, p-val < .0001

Test of Moderators (coefficient 2):
F(df1 = 1, df2 = 1292) = 1.6336, p-val = 0.2014

Model Results:

         estimate      se     tval    df    pval    ci.lb   ci.ub    
intrcpt   -0.0353  0.0515  -0.6865  1292  0.4925  -0.1364  0.0657    
ess.se     0.1231  0.0963   1.2781  1292  0.2014  -0.0658  0.3120    

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

If you followed the early sections of this tutorial, we assume you can interpret these results by yourself. Briefly, if we look at ess.se given under Model Results: p-value and 95% CIs suggest that there is no statistical relationship between the effect size its error, meaning that no small study effect exits. The bubble plot (Figure S5) also indicates that there is no visual correlation between effect size and its error - effect size symmetrically distribute on the funnel plot (Figure S6).

Figure S5
Bubble plot showing the relationship between the effect size magnitude and its adjusted sampling error (effective sample size based). Small studies (low precision) do not report large effect sizes.

Figure S6
Visual inspection of the typical funnel plot to identify the small study effect.

During the peer review stage, one referee suggested using the ‘effective’ sample size rather than inverse standard error. We do agree with this point. Therefore, we added a new funnel plot using effective sample sizes as y-axis in this tutorial (Figure S6.2). It is important to note that funnel plots using either inverse standard error or ‘effective’ sample size are not reliable for detecting publication bias. This is because they are only visual checks. To properly detect publication bias, a regression-based statistical test needs to be used, as shown above.

Figure S6.2
Visual inspection of the funnel plot based on effective sample size to identify the small study effect.

Detecting decline effect

The decline effect, also known as time-lag bias, is another prominent form of publication bias, where effect sizes tend to get closer to zero over time. Testing for a decline effect is important because the temporal changes in evidence of a given field poses a threat to environmental policy-making, management, and practices. Decline effects can be tested by a meta-regression with publication year (centered to ease interpretation: \(c(year_{j[i]})\)) as a moderator:

\[ z_{i} = \beta_{0} + \beta_1c(year_{j[i]}) + \mu_{j[i]} + e_{i} + m_{i}, (18) \] Equation 18 can be fitted with code:

mod_MLMR_lnRR_Year.c <- rma.mv(yi = lnRR, 
                               V = VCV, 
                               mods = ~ Year.c, # add centered year as a moderator to detect decline effect.
                               random = list(~1 | Study_ID, 
                                             ~1 | ES_ID), 
                               method = "REML", 
                               test = "t", 
                               data = dat2_Midolo_2019,
                               sparse = TRUE
                               )

As you see in the following results,regression slope is Year.c = 0.002 (95% CIs = [-0.0022 to 0.0062]), which is very small and not statistically different from zero (t_value = 0.9135 and p-val = 0.3611), suggesting studies with statistically significant results do not publish earlier than these with statistically non-significant results results, i.e. no time-lag bias (Figure S7).


Multivariate Meta-Analysis Model (k = 1294; method: REML)

   logLik   Deviance        AIC        BIC       AICc   
-106.5039   213.0079   221.0079   241.6637   221.0390   

Variance Components:

            estim    sqrt  nlvls  fixed    factor 
sigma^2.1  0.0114  0.1067     71     no  Study_ID 
sigma^2.2  0.0517  0.2273   1294     no     ES_ID 

Test for Residual Heterogeneity:
QE(df = 1292) = 65896.5620, p-val < .0001

Test of Moderators (coefficient 2):
F(df1 = 1, df2 = 1292) = 0.8345, p-val = 0.3611

Model Results:

         estimate      se    tval    df    pval    ci.lb   ci.ub    
intrcpt    0.0267  0.0170  1.5668  1292  0.1174  -0.0067  0.0601    
Year.c     0.0020  0.0021  0.9135  1292  0.3611  -0.0022  0.0062    

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Figure S7
Bubble plot showing the relationship between the effect size magnitude and publication year (centered on 2008). There is no temporal trend in the changes of the effect size magnitude.

Correcting for publication bias

Besides detecting publication bias, it is necessary to correct for such a bias to check the robustness of model estimates. The intercept \(\beta_{0}\) in Equation 17 can be interpreted as the publication-bias-corrected overall effect (‘true’ effect). Because \(\beta_{0}\) is the model coefficient when setting \(\frac {1} {\tilde{n}}\) = 0, which means the dataset has a infinite sample size (high precision) and thus has no small-study effect. To obtain publication-bias-corrected overall effect, we need to fit Equation 17 with:

mod_MLMR_lnRR_ess.var <- rma.mv(yi = lnRR, 
                                V = VCV, 
                                mods = ~ ess.var, # adding adjusted sampling error to obtain publication-bias-corrected overall effect - which is potentially regarded as 'true' effect.
                                random = list(~1 | Study_ID, 
                                              ~1 | ES_ID), 
                                method = "REML", 
                                test = "t", 
                                data = dat2_Midolo_2019,
                                sparse = TRUE
                                )

intrcpt printed under Model Results shows that intercept \(\beta_0\) (-0.0029, 95% CIs = [-0.0659 to 0.0602]) is still statistically non-significant (p-value = 0.9285), although the magnitude and direction change. This suggests that the meta-analytic estimate of this dataset ([1]) is robust to publication bias (Table S4).


Multivariate Meta-Analysis Model (k = 1294; method: REML)

   logLik   Deviance        AIC        BIC       AICc   
-106.4500   212.8999   220.8999   241.5557   220.9310   

Variance Components:

            estim    sqrt  nlvls  fixed    factor 
sigma^2.1  0.0113  0.1065     71     no  Study_ID 
sigma^2.2  0.0516  0.2272   1294     no     ES_ID 

Test for Residual Heterogeneity:
QE(df = 1292) = 65878.9167, p-val < .0001

Test of Moderators (coefficient 2):
F(df1 = 1, df2 = 1292) = 1.1843, p-val = 0.2767

Model Results:

         estimate      se     tval    df    pval    ci.lb   ci.ub    
intrcpt   -0.0029  0.0321  -0.0897  1292  0.9285  -0.0659  0.0602    
ess.var    0.1048  0.0963   1.0883  1292  0.2767  -0.0841  0.2937    

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Table S4 Comparison of original meta-analytic estimates and bias-corrected meta-analytic estimates.

Accounting for heterogeneity when detecting publication bias

In our main text, We introduce Equation 19 to simultaneously detect two forms of publication bias while accounting for heterogeneity to increase power and reduce Type I error rate:

\[ z_{i} = \beta_{0} + \beta_1\sqrt{\frac {1} {\tilde{n_i}}} + \beta_2c(year_{j[i]}) + \sum \beta_{h}x_{h[i]} + \mu_{j[i]} + e_{i} + m_{i}, (19) \] To fit Equation 19 via rma.mv, we only need to put adapted sampling error (\(\beta_1\sqrt{\frac {1} {\tilde{n_i}}}\)), publication year (\(c(year_{j[i]}\)) and other important study characteristics as moderator variables in a meta-regression (e.g., multiple meta-regression; see section Explaining variance with meta-regression). We arbitrarily select elevation and trait as moderators to demonstrate:


Multivariate Meta-Analysis Model (k = 1294; method: REML)

  logLik  Deviance       AIC       BIC      AICc   
-67.0224  134.0449  158.0449  219.9377  158.2904   

Variance Components:

            estim    sqrt  nlvls  fixed    factor 
sigma^2.1  0.0099  0.0996     71     no  Study_ID 
sigma^2.2  0.0486  0.2204   1294     no     ES_ID 

Test for Residual Heterogeneity:
QE(df = 1284) = 63164.1842, p-val < .0001

Test of Moderators (coefficients 1:10):
F(df1 = 10, df2 = 1284) = 8.9597, p-val < .0001

Model Results:

               estimate      se     tval    df    pval    ci.lb    ci.ub      
ess.var          0.0119  0.0936   0.1275  1284  0.8985  -0.1717   0.1956      
Year.c           0.0021  0.0021   1.0005  1284  0.3172  -0.0020   0.0061      
elevation_log    0.0284  0.0100   2.8385  1284  0.0046   0.0088   0.0481   ** 
traitdC13       -0.1900  0.0707  -2.6891  1284  0.0073  -0.3286  -0.0514   ** 
traitLA         -0.2446  0.0713  -3.4329  1284  0.0006  -0.3844  -0.1048  *** 
traitLMA        -0.1165  0.0710  -1.6399  1284  0.1013  -0.2559   0.0229      
traitNarea      -0.0962  0.0708  -1.3597  1284  0.1742  -0.2351   0.0426      
traitNmass      -0.0939  0.0699  -1.3432  1284  0.1795  -0.2311   0.0433      
traitPmass      -0.0150  0.0746  -0.2016  1284  0.8403  -0.1614   0.1313      
traitSLA        -0.2035  0.0698  -2.9160  1284  0.0036  -0.3404  -0.0666   ** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the follow model outputs, we see that the slopes of \(\sqrt{\frac {1} {\tilde{n_i}}}\) (\(\beta_1\)) and \(c(year_{j[i]}\) (\(\beta_2\)) still remain non-significant, albeit changes in magnitude, confirming that there is no publication bias.

Technical Recommendation

Note that when running complex models, some model parameters cannot be estimated well. This is especially so for variance components. Therefore, it is a good practice to check whether model parameters are all identifiable, which can be checked using the profile() in metafor package:

profile(mod_MLMR_lnRR_whole, sigm2=1, progbar=FALSE) # check the estimates of between-study ($\sigma^2_1$ on the following figure) and within-study variance ($\sigma^2_2$)

From the variance component profiles, we know that their estimates are quite stable, as the likelihood goes down quickly after identifying the maximum value (vertical line). This basically means that the estimates of the model parameters are trustful. But this is not always the case for the complex meta-analytic models. Note that in this example, we used Quasi-Newton method to find the maximum (log)likelihood over parameters like variance components (REML does not have a close-form solution, so we have to use numerical optimization techniques).

Conducting sensitivity analysis & critical appraisal

Sensitivity of assumption of within-study (sampling) correlation

Here, we show how to perform a sensitivity analysis to examine the extent to which the overall effect (e.g., \(\beta_{0}\)) is sensitive to the assumption of within-study (sampling) correlation \(\rho\) values used for constructing variance-covariance matrix (VCV).

First, set a series of \(\rho\) (i.e., 0.3, 0.5, 0.7, 0.9) (we assume these values arbitrarily; you can set them based on your expertise in your field):

rho_range <- c(0.3, 0.5, 0.7, 0.9)

Then, we write a function to help repeatedly run the specified model, changing \(\rho\) at a time:

ML_VCV_range <- list() # repeatedly run the specified model with varying rho
for (i in 1:length(rho_range)) {

VCV_range <- vcalc(vi = lnRRV, cluster = Study_ID, obs = ES_ID, 
                  rho = rho_range[i],
                  data = dat2_Midolo_2019
                  ) # impute VCV matrix with variying rho
              
ML_VCV_range[[i]] <- rma.mv(yi = lnRR, 
                            V = VCV_range, # VCV matrix with varying values of rho. 
                            random = list(~1 | Study_ID, 
                                          ~1 | ES_ID), 
                            method = "REML", 
                            test = "t", 
                            data = dat2_Midolo_2019
                           )} # run model with different rho values.

From Table S5, we see that the overall effect \(\beta_{0}\) does not change with the changing of \(\rho\) values, indicating that the model estimates are robust to different assumption of \(\rho\).

Table S5 Sensitivity analysis examining the robustness of the overall effect (i.e., \(\beta_0\)) to the assumption of \(\rho\) values.

Leave-one-out analysis

The sensitivity (robustness of results) can be evaluated using leave-one-out analysis:repeatedly fitting the model to the dataset with one unit/entry (e.g., study) removed at a time. ‘One unit/entry’ denotes a row from the dataset fed to the model.

Leave-one-out analysis is a useful method to diagnose outlier and influential studies (which exert disproportionate influence on model estimates like the overall effect \(\beta_0\) and its 95% CIs). As you might imagine, leave-one-out analysis re-fit the model N (the number of study) times. So it may take quite a long time to finish such a analysis if you have a large N.

For fixed and random effects models, the implementation of leave-one-out analysis can be easily performed using the leave1out() function from package metafor. The syntax is just one line:

leave1out(mod_FE_lnRR)

Let’s have a look at the results of the first 10 rows:


   estimate     se     zval   pval   ci.lb   ci.ub          Q     Qp      I2 
1   -0.0144 0.0004 -34.0889 0.0000 -0.0152 -0.0135 41265.3873 0.0000 96.8690 
2   -0.0144 0.0004 -34.0869 0.0000 -0.0152 -0.0135 41264.7609 0.0000 96.8690 
3   -0.0144 0.0004 -34.0954 0.0000 -0.0152 -0.0135 41266.2873 0.0000 96.8691 
4   -0.0144 0.0004 -34.0948 0.0000 -0.0152 -0.0135 41266.5291 0.0000 96.8691 
5   -0.0143 0.0004 -34.0330 0.0000 -0.0152 -0.0135 41257.2615 0.0000 96.8684 
6   -0.0143 0.0004 -33.9758 0.0000 -0.0151 -0.0135 41220.5405 0.0000 96.8656 
7   -0.0144 0.0004 -34.0714 0.0000 -0.0152 -0.0135 41264.3804 0.0000 96.8690 
8   -0.0143 0.0004 -34.0464 0.0000 -0.0152 -0.0135 41246.8031 0.0000 96.8676 
9   -0.0142 0.0004 -33.6629 0.0000 -0.0150 -0.0134 41245.2609 0.0000 96.8675 
10  -0.0147 0.0004 -34.8368 0.0000 -0.0155 -0.0139 41122.8168 0.0000 96.8582 
        H2 
1  31.9392 
2  31.9387 
3  31.9399 
4  31.9400 
5  31.9329 
6  31.9044 
7  31.9384 
8  31.9248 
9  31.9236 
10 31.8288

For our proposed multilevel model, no existing packages or functions are ready to use. But you can modify the following code to implement leave-one-out analysis in the framework of multilevel model. Later, we will write a corresponding function and wrap it into orchaRd package. The syntax for using study as unit deleted at a time is:

dat2_Midolo_2019$leave_out <- as.factor(dat2_Midolo_2019$study_name) # create the variable that will be left out
leave1out_ES <- list() # create a list to contain model estimates
for(i in 1:length(levels(dat2_Midolo_2019$leave_out))){
  # create the data with one study removed at a time
  dat <- dat2_Midolo_2019 %>% filter(leave_out != levels(dat2_Midolo_2019$leave_out)[i]) # repeatedly run the multilevel model, leaving out one study at a time
  
  VCV_leave1out <- list() 
  VCV_leave1out[[i]] <- impute_covariance_matrix(vi = dat$lnRRV, cluster = dat$Study_ID, r = 0.5)  # create a list of VCV matrices for following model fitting
  
  leave1out_mod[[i]] <- rma.mv(yi = lnRR, 
                                 V = VCV_leave1out[[i]], 
                                 random = list(~1 | Study_ID,
                                               ~1| ES_ID), 
                                 method = "REML", 
                                 test = "t",
                                 data = dat,
                                 sparse = TRUE)} # model fitting


est.func <- function(mod){
  df <- data.frame(est = mod$b, lower = mod$ci.lb, upper = mod$ci.ub)
  return(df)
} # write a simple function to extract intercept and 95% CIs from each of the above fitted models

leave1out_results <- lapply(leave1out_mod, function(x) est.func(x)) %>% bind_rows %>% mutate(left_out = levels(dat2_Midolo_2019$leave_out)) # turn the model estimates in the format of data frame

The results of leave-one-out analysis are shown on Figure S8, where do not observe any outliers or influential studies obviously having disproportionate effects on the model estimates.

Figure S8 Leave-one-out analysis showing overall effects \(\beta_0\) and 95% CIs based on dataset with one study left out at a time from model fitting.

The distribution of overall effect (\(\beta_0\)) after deleting one study at a time also confirms no obvious outlier (Figure S9).

Figure S9 Leave-one-out analysis showing distribution of overall effects \(\beta_0\) based on dataset with one study left out at a time from model fitting.

There are also other diagnostic indices can be calculated from leave-one-out analysis, for example, Cook’s distance, the difference between the regression coefficients (DFBETAS), and the hat value:

Cook’s distance = the Mahalanobis distance between the predicted (average) effect based on the full dataset and those based on the dataset with one study excluded from the model fitting. A study may be considered as ‘outlier’ if the lower-tail area of a chi-square distribution (with \(p\) degrees = the number of model coefficients of freedom) cut off by the Cook’s distance is more than 50%.
DFBETAS = changes in the predicted (average) effect for a specific study (quantified as standard deviations) after excluding the specific study from the model fitting. A study may be considered as ‘outlier’ the value of is more than 1.
Hat value = the diagonal elements of the hat matrix that can transform the vector of observed effect into the vector of predicted. Halt value can determine the magnitude of residual based on dataset with a deleted study and therefore can help identify outlying studies. A study may be considered as ‘outlier’ if the Halt value is more than \(3(\frac {p} {N})\) (where \(N\) = the number of studies).

There is point worth noting - these indices can be easily computed in R, but the above cut-offs are arbitrary. Therefore, if you want to use these indices, you really need informed judgments. Below we show how to calculate these indices.

Cook’s distance can be calculated via cooks.distance():

cooks.distance(mod_ML_lnRR_VCV, cluster = Study_ID)

Figure S10 Cook’s distance showing how much all of the predicted effects in the model change when one study is deleted.

The calculation of DFBETAS and halt values can be done with dfbetas() and hatvalues(), respectively. The syntax is just as simple as that in cooks.distance(). Given the extensive computation these methods will take (our example dataset is very large), we do not show the results.

Quantitative critical appraisal

Besides the qualitative methods to critically appraise the risk of bias of primary studies included in a meta-analysis, we recommend the quantitative methods. When compiling dataset, you need to extract and code variables related risk-of-bias (RoB) characteristics, for example, the employment of blinding, randomization, and selective reporting. Then you can code these variables as moderators to be incorporated into a meta-regression. Unfortunately, at present, no environmental meta-analyses (at least our surveyed papers) have coded relevant moderators. Therefore, we are unable to provide an example to show implementation. The implementation is straightforward - you just need to use mods argument in rma.mv() function and supply RoB variables to the mods using a formula: mods = ~ RoB.

Our team (Shinichi Nakagawa) has published a paper that examined whether the selective reporting inflates the effect size estimates:

Parker T H, Greig E I, Nakagawa S, et al. Subspecies status and methods explain strength of response to local versus foreign song by oscine birds in meta-analysis[J]. Animal Behaviour, 2018, 142: 1-17.

In this paper, we ran a meta-regression with selective reporting as a moderator and found that selective reporting inflated the effect size estimates. This paper exactly demonstrates the need for transparent reporting & open archiving that we contend in our main text. We refer to the relevant code archived at Open Science Framework to reproduce this quantitative critical appraisal if you think you are capable of conducting such an analysis with our illustration.

Other relevant and advanced issues

Missing data

According to our survey, many environmental meta-analyses have the issues of missing values for some of the primary studies:

missing standard deviations or sample sizes and associated with means, making effect size calculations in-feasible;
missing descriptions of study characteristics related variables, making coding of moderator variables impossible.

A common approach to deal these issues is to delete the relevant studies. but this will reduce the statistical power and precision of model coefficients (e.g., \(\beta_0\) and \(\beta_1\)) due to the loss of sample size (in this case, the number of deleted studies), and limit the ability to reveal the drivers of heterogeneity between effect sizes due the loss of explanatory variables (i.e., moderators).

Here, we show how to impute missing standard deviation, which is a common phenomenon in environmental meta-analyses. We will introduce two readily implementable techniques to impute standard deviation. Both methods are (partial) based on mean-variance relationship (strong correlation between standard deviation and mean).

The first method is simple multiple imputation, which imputes standard deviation using the coefficient of variation (CV) from all complete cases but uses resampling method to account for uncertainty when using CV from complete information;

The second method is an improved method for imputing standard deviations of lnRR, which imputes standard deviation using weighted average CV to improve the precision of sampling variance estimates. The reason why we introduce an imputation method for lnRR is because lnRR is the most commonly used effect size statistic in environmental meta-analyses (details see survey results in the main text).

Before demonstrating this method, we must to artificially create incomplete standard deviations in our dataset. we randomly select 20% of the studies and delete their standard deviations from both control (sd_control) and treatment groups (sd_treatment).

missing_SD <- dat2_Midolo_2019 # copy the dataset into missing_SD, which is used as the dataset for illustration of imputation method

set.seed(2022) # set the seed (for the random number generator) to make the following results fully reproducible; it is good practice to set the seed to make our results fully reproducible

stdies <- sample(unique(missing_SD$study_name), size = 0.2*(length(unique(missing_SD$study_name)))) # randomly sample 20% of studies
    
missing_SD[which(missing_SD$study_name %in% stdies), c("sd_treatment", "sd_control")] <- NA # create missingness of SD at the study level

Let’s have a look at the summary of artificially-created missingness in the dataset:


Summary of missingness:

        COLUMN PERCENT_MISSINGNESS IMPUTATIONS
      Study_ID                   0           0
    study_name                   0           0
       species                   0           0
         trait                   0           0
     treatment                   0           0
       control                   0           0
  sd_treatment                  12         159
    sd_control                  12         159
   n_treatment                   0           0
     n_control                   0           0
     elevation                   0           0
 elevation_log                   0           0
          lnRR                   0           0
         lnRRV                   0           0
         ES_ID                   0           0
       ess.var                   0           0
        ess.se                   0           0
       n_tilde                   0           0
          Year                   0           0
        Year.c                   0           0
     leave_out                   0           0

Total missingness: 1.2% (318 imputations needed)

                     COLUMN PERCENT_MISSINGNESS IMPUTATIONS
Study_ID           Study_ID             0.00000           0
study_name       study_name             0.00000           0
species             species             0.00000           0
trait                 trait             0.00000           0
treatment         treatment             0.00000           0
control             control             0.00000           0
sd_treatment   sd_treatment            12.28748         159
sd_control       sd_control            12.28748         159
n_treatment     n_treatment             0.00000           0
n_control         n_control             0.00000           0
elevation         elevation             0.00000           0
elevation_log elevation_log             0.00000           0
lnRR                   lnRR             0.00000           0
lnRRV                 lnRRV             0.00000           0
ES_ID                 ES_ID             0.00000           0
ess.var             ess.var             0.00000           0
ess.se               ess.se             0.00000           0
n_tilde             n_tilde             0.00000           0
Year                   Year             0.00000           0
Year.c               Year.c             0.00000           0
leave_out         leave_out             0.00000           0

Below let’s show the implementation one by one.

Simple multiple imputation

Multiple imputation methods are generally complex, and a full description is beyond the scope of our paper. Many multiple imputation methods can be implemented via a powerful package mice. If you have interests in these advanced methods, you can look at the documentation of (help(mice)). Here, we introduce a simple multiple imputation method, which is based on the strong correlation between standard deviation and mean values and resort to resampling approach to make the imputed SD get rid of the uncertainty of imputation itself, such that the Type I error rate can be reduced. This method can be implemented via impute_SD() function in metagear package. The syntax for this is:

dat_simple <- impute_SD(missing_SD, 
                        columnSDnames = c("sd_treatment", "sd_control"), # a string or list containing the labels of the column(s) with missing SD;
                        columnXnames = c("treatment", "control"), a string or list containing the labels of the column(s) with mean values for each SD.
                        method = "HotDeck")


Summary of missingness:

        COLUMN PERCENT_MISSINGNESS IMPUTATIONS
      Study_ID                   0           0
    study_name                   0           0
       species                   0           0
         trait                   0           0
     treatment                   0           0
       control                   0           0
  sd_treatment                   0           0
    sd_control                   0           0
   n_treatment                   0           0
     n_control                   0           0
     elevation                   0           0
 elevation_log                   0           0
          lnRR                   0           0
         lnRRV                   0           0
         ES_ID                   0           0
       ess.var                   0           0
        ess.se                   0           0
       n_tilde                   0           0
          Year                   0           0
        Year.c                   0           0
     leave_out                   0           0

Total missingness: 0% (0 imputations needed)

                     COLUMN PERCENT_MISSINGNESS IMPUTATIONS
Study_ID           Study_ID                   0           0
study_name       study_name                   0           0
species             species                   0           0
trait                 trait                   0           0
treatment         treatment                   0           0
control             control                   0           0
sd_treatment   sd_treatment                   0           0
sd_control       sd_control                   0           0
n_treatment     n_treatment                   0           0
n_control         n_control                   0           0
elevation         elevation                   0           0
elevation_log elevation_log                   0           0
lnRR                   lnRR                   0           0
lnRRV                 lnRRV                   0           0
ES_ID                 ES_ID                   0           0
ess.var             ess.var                   0           0
ess.se               ess.se                   0           0
n_tilde             n_tilde                   0           0
Year                   Year                   0           0
Year.c               Year.c                   0           0
leave_out         leave_out                   0           0

We see that all the missing standard deviations (in both control and treatment groups) are imputed.

Improved imputation

Our team (Shinichi Nakagawa and Malgorzata Lagisz) recently developed a tailored method for imputing missing standard deviations in lnRR. Our heavy simulation indicates that this method outputs the simple method based on the mean-variance relationship. Details see the following paper:

Nakagawa, S., Noble, D. W., Lagisz, M., Spake, R., Viechtbauer, W., & Senior, A. M. (2022, May 19). A robust and readily implementable method for the meta-analysis of response ratios with and without missing standard deviations. https://doi.org/10.32942/osf.io/7thx9

We assume this method will help environmental meta-analysts a great deal because lnRR is the mostly used effect size statistic in the field. In brief, this method is based on the weighted average CV rather than a simple average of CV. Using weighted average CV will give some data points more weight than others, such that the precision of sampling variance estimates can be improved. Below, we borrow two custom functions we wrote for the above paper to show implementation.

cv_avg() = compute the weighted average (square) CV within a study and the weighted average (square) CV between studies

lnrr_laj() = compute the point estimate of lnRR based on Taylor expansion

v_lnrr_laj() = compute the sampling variance for lnRR based on second order Taylor expansion

The full code is give below:

dat_improved <- missing_SD %>% 
                mutate(cv_control = na_if(sd_control / control, Inf),
                       cv_treatment = na_if(sd_treatment / treatment, Inf)) # first calculate CV on dataset with missing SDs wherein missing SD will be ignored when calculating

dat_improved <- cv_avg(x = control, sd = sd_control, n = n_control, 
                       group = study_name, 
                       label = "1", # control group
                       data = dat_improved) # calculate the average between-study CV, which will replace missing SD
    
dat_improved <- cv_avg(x = treatment, sd = sd_treatment, n = n_treatment,
                       group = study_name, 
                       label = "2", # treatment group
                       data = dat_improved)

dat_improved <- dat_improved %>%
                mutate(cv2_control_new = if_else(is.na(cv_control), b_CV2_1, cv_control^2),
                       cv2_treatment_new = if_else(is.na(cv_treatment), b_CV2_2, cv_treatment^2)) # use weighted between-study CV to replace missing CV (which is due to missing SD)


dat_improved <- dat_improved %>%
                mutate(lnRR_new = lnrr_laj(m1 = control, m2 = treatment, cv1_2 = cv2_control_new, cv2_2 = cv2_treatment_new, n1 = n_control, n2 = n_treatment),
                       lnRRV_new = v_lnrr_laj(cv1_2 = cv2_control_new, n1= n_control, cv2_2 = cv2_treatment_new, n2 = n_treatment)) # compute the new point estimate of lnRR and and its sampling variance, respectively. This uses either the between-individual CV^2 when missing or normal CV^2 when not missing

We summarize the model estimates based the above two imputation methods in Table S6.

Table S6 Comparison of estimates of model coefficients based on datasets with two simple imputation method and improved version.

Complex non-independence

Multiple sources of non-independence

Environmental meta-analytic datasets may have more many ways for studies to be non-independent from one another. Several ‘clustering variables’, other than study identity, may identify studies as dependent. As you could probably imagine, genus (species) could be such a clustering variable in an environmental dataset with multiple species included, as effect sizes derived from the same species are probably more similar to each other than effect sizes from different species due to similar genetics, shared history or phylogenetic relatedness. The multilevel model has a flexible random effects structure handling complex non-independence like, for example, taxonomic dependence, by adding associated clustering variables (e.g.,species ) as different levels of random effects:

\[ z_{i} = \beta_{0} + a_{k[i]} + s_{k[i]} + \mu_{j[i]} + e_{i} + m_{i}, (19) \] Argument random in rma.mv have an elegant solution to construct random effects structure. For example, if we want to account for both study and species (taxonomic) effects, we need to supply them via argument random and use list() to bind all these clustering variables together: random = list(~ 1 | species, ~ 1 | Study_ID, ~ 1 | ES_ID). The complete code will be:

rma.mv(yi = lnRR, 
       V = VCV, 
       random = list(~1 | species, # add species identity as a random effect, which allows effect sizes vary between species; 
                     ~1 | Study_ID, # add study identity as a random effect, which allows effect sizes vary between studies; 
                     ~1 | ES_ID), # add effect size as a random effect, which allows effect sizes vary within studies. 
       method = "REML", 
       test = "t", 
       data = dat2_Midolo_2019
      )

The output is:


Multivariate Meta-Analysis Model (k = 1294; method: REML)

   logLik   Deviance        AIC        BIC       AICc   
-107.3514   214.7028   222.7028   243.3617   222.7338   

Variance Components:

            estim    sqrt  nlvls  fixed    factor 
sigma^2.1  0.0000  0.0064    109     no   species 
sigma^2.2  0.0113  0.1064     71     no  Study_ID 
sigma^2.3  0.0516  0.2272   1294     no     ES_ID 

Test for Heterogeneity:
Q(df = 1293) = 65913.4258, p-val < .0001

Model Results:

estimate      se    tval    df    pval    ci.lb   ci.ub    
  0.0268  0.0170  1.5730  1293  0.1160  -0.0066  0.0601    

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We see that results given under Variance Components show variance components for each of 3 random effects, although there seems not heterogeneity between species.

Note

According to our survey, only a few (XX%) environmental meta-analyses used multilevel model. Therefore, environmental meta-analysts may be not familiar with ‘random effects’? Many researchers have attempted to define it ( see Andrew Gelman’s nice summary ). Here, we give an intuitive explanation in the context of a environmental meta-analysis:

When a variable in a meta-analytic model is specified as a random effect, we believe that it contributes noise (variation) to the overall mean and thus has a random effect on the overall mean. For example, treating ‘species’ as a random effect assumes that the true effects are heterogeneous within species and enables us to quantify how much variation there is among species. On the other hand, treating species as a fixed effect means that species levels are identical across different studies and thus have a systematic effect on the overall effect; this is equivalent to ask: do one species responds more to an environmental stressor than others?).

select a ‘best’ random effects structure

Here, we introduce some rules of thumb to help decide how to specify vrandom effects structures to capture the hierarchical data structure (e.g., non-independence due to ‘clustering’, ‘nesting’, or ‘crossing’), which may be relevant to your meta-analysis.

Rule 1

Consider whether the variable in question is a true source of heterogeneity according to your knowledge in your field - we are against solely using data-driven random effects structure without considering whether the tested variable has true heterogeneity.

Rule 2

A random effect should have > 5 levels, such that the variance to obtain approximately unbiased estimates.

Rule 3

Examining whether specifying the random effect structure improves model for, such as by examining fit statistics (e.g., AIC and BIC), goodness-of-fit (e.g., \(R^2_{marginal}\)).

Study (Study_ID) and effect size identities (ES_ID) are typical random effects for an environmental meta-analysis. Let’s use [1]‘s dataset empirically test this point and show how to select ’best’ random effects structure using information-theoretic approaches alongside likelihood ratio tests (implemented via function anova.rma()). There are three random effects candidates:

Effect size identity (ES_ID) - unique ID for each pairwise comparison for effect size calculation.
Study identity (Study_ID) - unique ID for each included primary study.
Species identity (species) - name of species included in the primary studies.

Let’s first fit a null model without including the above random effects candidates as the default reduced model:

null.re <- rma.mv(yi = lnRR, 
                  V = VCV, 
                  method = "ML", # Setting method = "ML" rather than "REML" when conduct model selection.
                  test = "t", 
                  data = dat2_Midolo_2019
                  )

Then, add study identity (Study_ID) as a random effect via argument random:

study <- rma.mv(yi = lnRR, 
                V = VCV, 
                random = ~1 | Study_ID,  
                method = "ML", 
                test = "t", 
                data = dat2_Midolo_2019
                )

Let’s conduct a likelihood ratio test to compare the two fitted models’ quality in terms of AIC, BIC, AICc, log-likelihood values:

anova.rma(null.re, study)


        df        AIC        BIC       AICc      logLik        LRT   pval 
Full     2 47642.9931 47653.3240 47643.0024 -23819.4965                   
Reduced  1 60148.2005 60153.3660 60148.2036 -30073.1003 12507.2075 <.0001 
                QE 
Full    65913.4258 
Reduced 65913.4258

We see that model (Full) with Study_ID as a random effect has a lower AIC value, in contrast to the null model (Reduced). The log-likelihood ratio test indicates that Study_ID significantly improve model fit (pval = < 0.0001). Note that when we recommend to use maximum likelihood (ML) rather than restricted maximum likelihood (REML) when using information criteria, as using likelihood-based methods (and hence information criteria) to compare models having different fixed effects that are fitted by REML will generally yield nonsense. This is an important point when doing model selection (see next section). It is therefore important to remember to set methods = "ML" rather than methods = "REML" when comparing models with different random effects (and also fixed effects; see next section).

In the same vein, we can examine whether ES_ID contributes to model fit:

es <- rma.mv(yi = lnRR, 
             V = VCV, 
             random = ~1 | ES_ID,  
             method = "ML", 
             test = "t", 
             data = dat2_Midolo_2019
             )

anova.rma(null.re, es)


        df        AIC        BIC       AICc      logLik        LRT   pval 
Full     2   270.8693   281.2003   270.8786   -133.4347                   
Reduced  1 60148.2005 60153.3660 60148.2036 -30073.1003 59879.3312 <.0001 
                QE 
Full    65913.4258 
Reduced 65913.4258

From AIC, LRT and pval given in the above output, we see that model with ES_ID as a random effect (Full) is much better than model with without random effect (Reduced). Therefore, effect size identity is an important random effect that our model should account for.

Next, lets compare whether a model with both Study_ID and ES_ID as random effects is better than a model with only ES_ID as a random effect:

study.es <- rma.mv(yi = lnRR, 
                   V = VCV, 
                   random = list(~1 | Study_ID,  
                                 ~1 | ES_ID),  
                   method = "ML", 
                   test = "t", 
                   data = dat2_Midolo_2019
                   )

anova.rma(study.es, es)


        df      AIC      BIC     AICc    logLik     LRT   pval         QE 
Full     3 221.5487 237.0451 221.5673 -107.7743                65913.4258 
Reduced  2 270.8693 281.2003 270.8786 -133.4347 51.3207 <.0001 65913.4258

As expected, the full model has a smaller AIC value than that of reduced model, indicating that model defining the nested random effects (i.e., accounting for non-independence) structure using Study_ID and ES_ID has a better model fit (Full), compared to model ignoring non-independence (Reduced). Next, lets’ explore whether animal species identity is an important random effect. First add the coded variable species as a random effect term via argument random:

species.study.es <- rma.mv(yi = lnRR, 
                              V = VCV, 
                              random = list(~1 | species,
                                            ~1 | Study_ID,  
                                            ~1 | ES_ID),  
                             method = "ML", 
                             test = "t", 
                             data = dat2_Midolo_2019
                             )

anova.rma(species.study.es,study.es)


        df      AIC      BIC     AICc    logLik    LRT   pval         QE 
Full     4 223.5484 244.2103 223.5794 -107.7742               65913.4258 
Reduced  3 221.5487 237.0451 221.5673 -107.7743 0.0003 0.9864 65913.4258

Looking at the above output, we see that adding species as a random effect does not change the AIC value. This suggests that intraspecific leaf traits are consistent across plant species; that there is only a small amount of heterogeneity among species. This is easily corroborated when calculating \(I^2\) at species level:

   I2_Total  I2_species I2_Study_ID    I2_ES_ID 
99.63048070  0.04486139 17.45782398 82.12779532

Model selection and model-averaging

As said in our main text, model selection is a powerful method to:

quantify the importance of moderators in explaining heterogeneity, which is useful when look for, for example, global drivers of environmental changes;
multimodel inference, which can make model inferences about the moderators in the context of models with all possible combinations of moderators rather than a single ‘best’ model;
multimodel predictions, which can predict (average) effects of a moderator and its CIs based on models with all possible combinations of moderators rather than a single ‘best’ model.

Below we show how to conduct model selection and multimodel inference using an information-theoretic approach (although \(R^2\) based model selection is a reasonable option, it is not preferable here). To do so, metafor package needs to borrow functionality from model-selection-dedicated packages like MuMIn and glmulti. We will use MuMIn package for illustration because the corresponding syntax is more straightforward and simpler than that of glmulti. Let’s stay with the dataset of [1]. This dataset only has two moderators, elevation and trait, so we include an additional term - their interaction - for illustration of a more complex model. We would like to emphasize one point again - in your meta-analysis, you need to need to select your moderators based on their a priori plausibility (predefined environmental questions you are going to address in your analyses) rather than including many more moderators or removing them until you get a significant model.

Selecting a ‘best5’ model

By ‘best model’, we mean the acceptable amount of information loss when we use a fitted model to approximate the real data generating mechanism. To select the best model, first, we need fit a multilevel multi-moderator meta-analytic model with all plausible moderators (Equation 19; full model). Then, dredge the full model to produce models with all possible combinations of moderators from to the full model (note that you need to use maximum likelihood (ML) rather than restricted ML; see early section for explanation):

mod.full <- rma.mv(yi = lnRR, 
                   V = lnRRV, 
                   mods = ~ trait * elevation_log,
                   random = list(~ 1 | Study_ID, ~ 1 | ES_ID), 
                   method="ML", 
                   test = "t",
                   data = dat2_Midolo_2019) # fit Equation 19 - multilevel multi-moderator meta-analytic model (full model with all plausible moderators).

eval(metafor:::.MuMIn) # use eval() function to extract helper functions from MuMIn and make them usable in metafor.

mod.candidate <- dredge(mod.full, beta = "none", evaluate = TRUE, rank = "AICc", trace=2) # dredge to produce all possible models


  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |=============================================================         |  88%

From Table S7, We see the fit statistics (log likelihood and AICc) from all possible models.

Table S7 Fit statistics of models with different combinations of moderators included in the full model (Equation 19 - multilevel multi-moderator meta-analytic model).

Next step, let’s select a sets of best model. This can be done by using thumb of rules, for example, delta AICc <= 2:

subset(mod.candidate, delta <= 2, recalc.weights = FALSE)

Global model call: rma.mv(yi = lnRR, V = lnRRV, mods = ~trait * elevation_log, random = list(~1 | 
    Study_ID, ~1 | ES_ID), data = dat2_Midolo_2019, method = "ML", 
    test = "t", sparse = TRUE)
---
Model selection table 
  (Int)  elv_log trt elv_log:trt df  logLik  AICc delta weight
8     + -0.01991   +           + 16 -74.686 181.8     0      1
Models ranked by AICc(x)

Model average

We also can make multi-model inference the model average. You might still remember that in the early section, all model inference (test of model slopes, e.g., \(\beta_1\)) base the classic null hypothesis testing. The advantage of model averaging is that it can take into account the “weights” from multiple plausible models. “Weights” is the “Akaike weights” that can be considered as the possibility that a specific model has the least information loss (see the above explanation). Put differently, you can treat weights as model probabilities. Note that this dataset is not a good example to illustrate model average because the weight in the first best model is almost 1 and that in other best models are 0 (look at the last column of Table S6). Unfortunately, we have very limited datasets from our survey due to the low data sharing rate in the environmental sciences. Anyway, the logic and syntax are the same:

summary(model.avg(mod.candidate))


Call:
model.avg(object = mod.candidate)

Component model call: 
rma.mv(yi = lnRR, V = lnRRV, mods = ~<5 unique rhs>, random = list(~1 | 
     Study_ID, ~1 | ES_ID), data = dat2_Midolo_2019, method = ML, test = t, 
     sparse = TRUE)

Component models: 
       df  logLik   AICc  delta weight
123    16  -74.69 181.80   0.00      1
12     10  -92.66 205.50  23.70      0
2       9  -95.86 209.86  28.06      0
1       4 -135.48 279.00  97.20      0
(Null)  3 -138.73 283.48 101.68      0

Term codes: 
      elevation_log               trait elevation_log:trait 
                  1                   2                   3 

Model-averaged coefficients:  
(full average) 
                          Estimate Std. Error z value Pr(>|z|)    
intrcpt                   0.115566   0.141088   0.819 0.412727    
elevation_log            -0.019907   0.022636   0.879 0.379164    
traitLA                  -0.097898   0.204791   0.478 0.632624    
traitLMA                 -0.353229   0.188036   1.879 0.060310 .  
traitNarea               -0.468677   0.185908   2.521 0.011701 *  
traitNmass               -0.244480   0.165871   1.474 0.140505    
traitPmass               -0.888061   0.248237   3.577 0.000347 ***
traitSLA                  0.131673   0.198118   0.665 0.506294    
elevation_log:traitLA     0.003795   0.033306   0.114 0.909281    
elevation_log:traitLMA    0.066417   0.030266   2.194 0.028204 *  
elevation_log:traitNarea  0.088490   0.029835   2.966 0.003017 ** 
elevation_log:traitNmass  0.055879   0.026963   2.072 0.038222 *  
elevation_log:traitPmass  0.182724   0.041437   4.410 1.04e-05 ***
elevation_log:traitSLA   -0.025019   0.032394   0.772 0.439920    
 
(conditional average) 
                          Estimate Std. Error z value Pr(>|z|)    
intrcpt                   0.115566   0.141088   0.819 0.412727    
elevation_log            -0.019907   0.022636   0.879 0.379163    
traitLA                  -0.097898   0.204791   0.478 0.632624    
traitLMA                 -0.353229   0.188036   1.879 0.060310 .  
traitNarea               -0.468677   0.185908   2.521 0.011701 *  
traitNmass               -0.244480   0.165871   1.474 0.140505    
traitPmass               -0.888061   0.248237   3.577 0.000347 ***
traitSLA                  0.131673   0.198118   0.665 0.506294    
elevation_log:traitLA     0.003795   0.033306   0.114 0.909281    
elevation_log:traitLMA    0.066417   0.030266   2.194 0.028201 *  
elevation_log:traitNarea  0.088491   0.029834   2.966 0.003016 ** 
elevation_log:traitNmass  0.055879   0.026962   2.073 0.038218 *  
elevation_log:traitPmass  0.182726   0.041434   4.410 1.03e-05 ***
elevation_log:traitSLA   -0.025019   0.032394   0.772 0.439918    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The above output is showing all the estimates of model coefficients for moderators based on model average procedure. It is a typical format to show model estimates in R. The interpretation is quite easy. We also make a table (Table S8) showing results of model average based on glmulti for comparison. We see that the results are exactly the same.

Table S8 The estimates of model slopes for moderators based on the model average inference method.

Scale dependence

Scale dependence is a widespread issue in the field of environmental sciences. Yet, it rarely has been accounted for in the current practice of environmental meta-analysis. A common type of scale dependence arises when the sampling unit varies in size, for example a quadrat or plot used to measure biodiversity can vary in size (e.g., 100 \(cm^2\) or 1 \(km^2\)). Smaller plot sizes underestimate biodiversity differences, and effect size statistics that estimate them. However, studies that use smaller plots tend to have greater replication than studies using larger plots. The scale dependence of any meta-analytic inference arises due to a relationship among replicates (sample size), plot size and sampling variance \(\nu_i\), which leads to covariation among the three. Basically, scale dependence will lead to biased estimates of sampling variance, which violates a basic assumption of meta-analysis; sampling variance is known because they (partially) serve as weights to average effect sizes across studies.

There are no established solutions on this issue. Incidentally, our team (Rebecca Spake) has a comprehensive simulation paper investigating the scale dependence in the context of meta-analytic model:

Spake R, Mori A S, Beckmann M, et al. Implications of scale dependence for cross‐study syntheses of biodiversity differences[J]. Ecology Letters, 2021, 24(2): 374-390.

Based on the results of this paper and our experience, we give three practical and easy-to-implement suggestions to mitigate the issue of scale dependence:

1. use lnRR rather than SMD as the effect size statistic to increase the accuracy of effect size and sampling variance estimates in terms of measurement of biodiversity, richness rather or species density. Visually assess relationships among effect size estimates, plot size, replication and variance.
1. conduct sensitivity analysis via, for example, fitting an unweighted model or using a different weighting scheme (e.g., weighting by an ordinal classification of study quality or non-parametric weighting) to get rid of the impact of inaccurate sampling variances; if using unweighted model, your dataset should be free of publication bias, otherwise you will get biased model coefficients
1. code the size of plot as a moderator and use meta-regression to control for its covariate effects on other moderators effects

Unfortunately, our survey did not find any dataset have coded plot size as a moderator (Plot.Size). Therefore, we are unable to provide a corresponding illustration. But remember the implementation is straightforward - you just need to use mods argument in rma.mv() function and supply Plot.Size variable to the mods using a formula: mods = ~ Plot.Size.

License

This documented is licensed under the following license: CC Attribution-Noncommercial-Share Alike 4.0 International.

Software and package versions

R version 4.0.3 (2020-10-10)

Platform: x86_64-w64-mingw32/x64 (64-bit)

locale: _LC_COLLATE=Chinese (Simplified)China.936, _LC_CTYPE=Chinese (Simplified)China.936, _LC_MONETARY=Chinese (Simplified)China.936, LC_NUMERIC=C and _LC_TIME=Chinese (Simplified)China.936

attached base packages: grid, stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: formatR(v.1.11), pander(v.0.6.4), gridGraphics(v.0.5-1), png(v.0.1-7), cowplot(v.1.1.1), ggthemr(v.1.1.0), ggalluvial(v.0.12.3), visdat(v.0.5.3), ggsignif(v.0.6.3), plotly(v.4.9.4.1), networkD3(v.0.4), GoodmanKruskal(v.0.0.3), mice(v.3.13.0), patchwork(v.1.1.1), metagear(v.0.7), glmulti(v.1.0.8), leaps(v.3.1), rJava(v.1.0-4), MuMIn(v.1.43.17), clubSandwich(v.0.5.3), metafor(v.3.9-21), numDeriv(v.2016.8-1.1), metadat(v.1.2-0), Matrix(v.1.5-3), readxl(v.1.3.1), ggpubr(v.0.4.0), DT(v.0.19), here(v.1.0.1), forcats(v.0.5.2), stringr(v.1.4.0), dplyr(v.1.0.10), purrr(v.0.3.4), readr(v.2.1.2), tidyr(v.1.2.1), tibble(v.3.1.8), ggplot2(v.3.4.0), tidyverse(v.1.3.1), palmerpenguins(v.0.1.0), rmdformats(v.1.0.3) and knitr(v.1.37)

loaded via a namespace (and not attached): backports(v.1.2.1), plyr(v.1.8.6), igraph(v.1.3.0), lazyeval(v.0.2.2), splines(v.4.0.3), crosstalk(v.1.1.1), TH.data(v.1.1-0), digest(v.0.6.27), htmltools(v.0.5.2), fansi(v.0.5.0), magrittr(v.2.0.3), tzdb(v.0.1.2), openxlsx(v.4.2.4), modelr(v.0.1.8), sandwich(v.3.0-1), colorspace(v.2.0-0), rvest(v.1.0.1), haven(v.2.4.3), xfun(v.0.29), tcltk(v.4.0.3), crayon(v.1.4.1), jsonlite(v.1.7.2), survival(v.3.2-7), zoo(v.1.8-9), glue(v.1.6.2), gtable(v.0.3.0), emmeans(v.1.6.3), car(v.3.0-11), abind(v.1.4-5), scales(v.1.2.1), mvtnorm(v.1.1-3), DBI(v.1.1.1), rstatix(v.0.7.0), Rcpp(v.1.0.8.3), viridisLite(v.0.4.0), xtable(v.1.8-4), foreign(v.0.8-81), latex2exp(v.0.9.4), stats4(v.4.0.3), htmlwidgets(v.1.5.3), httr(v.1.4.2), ellipsis(v.0.3.2), farver(v.2.1.0), pkgconfig(v.2.0.3), sass(v.0.4.0), dbplyr(v.2.1.1), utf8(v.1.2.2), labeling(v.0.4.2), tidyselect(v.1.1.1), rlang(v.1.0.6), munsell(v.0.5.0), cellranger(v.1.1.0), tools(v.4.0.3), cli(v.3.4.1), generics(v.0.1.0), pacman(v.0.5.1), broom(v.1.0.1), mathjaxr(v.1.2-0), evaluate(v.0.14), fastmap(v.1.1.0), yaml(v.2.2.1), fs(v.1.5.2), zip(v.2.2.0), pbapply(v.1.4-3), nlme(v.3.1-151), xml2(v.1.3.2), compiler(v.4.0.3), rstudioapi(v.0.13), beeswarm(v.0.4.0), curl(v.4.3.2), reprex(v.2.0.1), bslib(v.0.3.0), stringi(v.1.7.4), highr(v.0.9), lattice(v.0.20-41), vctrs(v.0.5.0), pillar(v.1.8.1), lifecycle(v.1.0.3), jquerylib(v.0.1.4), estimability(v.1.3), data.table(v.1.14.0), R6(v.2.5.1), bookdown(v.0.24), rio(v.0.5.29), vipor(v.0.4.5), codetools(v.0.2-18), MASS(v.7.3-54), assertthat(v.0.2.1), rprojroot(v.2.0.2), withr(v.2.5.0), multcomp(v.1.4-17), parallel(v.4.0.3), mgcv(v.1.8-33), hms(v.1.1.0), coda(v.0.19-4), rmarkdown(v.2.11), carData(v.3.0-4), lubridate(v.1.7.10) and ggbeeswarm(v.0.6.0)

References

1. Midolo G, De Frenne P, Hölzel N, Wellstein C. Global patterns of intraspecific leaf trait responses to elevation. Global Change Biology. Wiley Online Library; 2019;25:2485–98.

2. Nakagawa S, Schielzeth H. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in ecology and evolution. Wiley Online Library; 2013;4:133–42.