
Pool downstream model fits across multiple imputations (Rubin's rules)
Source:R/pool_mi.R
pool_mi.RdCombine regression coefficients from M model fits – one per imputed
dataset – into a single pooled table using Rubin's rules. The pooled
standard errors properly account for both within-imputation sampling
variance and between-imputation variance, so downstream inference
propagates the uncertainty introduced by imputation.
Arguments
- fits
A list of model fits of length
M >= 2. Any model class implementingcoef()andvcov()works (e.g.stats::lm,nlme::gls,lme4::lmer,glmmTMB::glmmTMB,phylolm::phylolm,phylolm::phyloglm). The output ofwith_imputations()is accepted directly.MCMCglmmfits are rejected – see Details.- conf.level
Confidence level for the pooled interval (default
0.95).- coef_fun
Function extracting a named numeric coefficient vector from one fit. Defaults to
stats::coef(). Supply a custom extractor for models wherecoef()returns a list (e.g.glmmTMBwith multiple components) – see Examples.- vcov_fun
Function extracting the variance-covariance matrix of the fixed-effect coefficients. Defaults to
stats::vcov(). Must return a square matrix whose row/column names matchcoef_fun(fit).- df_fun
Optional function returning the complete-data residual degrees of freedom
nu_comfrom one fit. When supplied, pooled degrees of freedom use the Barnard & Rubin (1999) small-sample correction, which is less biased for short series. WhenNULL(the default) the classical Rubin (1987) formula is used.
Value
A data.frame with one row per coefficient and columns:
termCoefficient name.
estimatePooled point estimate (
meanacross fits).std.errorPooled standard error
sqrt(T)whereT = W + (1 + 1/M) * B.dfPooled degrees of freedom (Barnard-Rubin if
df_funsupplied, else classical Rubin).statisticestimate / std.error.p.valueTwo-sided p-value from a t distribution on
df.conf.low,conf.highPooled
conf.levelinterval.fmiFraction of missing information.
rivRelative increase in variance due to non-response.
Details
Let \(\hat\theta_i\) be the coefficient vector from fit i and
\(U_i = \mathrm{vcov}(\mathrm{fit}_i)\), for \(i = 1, \ldots, M\).
Rubin's rules (Rubin 1987) give
$$\bar\theta = M^{-1} \sum_i \hat\theta_i$$
$$W = M^{-1} \sum_i \mathrm{diag}(U_i)$$
$$B = (M-1)^{-1} \sum_i (\hat\theta_i - \bar\theta)^2$$
$$T = W + (1 + 1/M) B$$
with pooled standard error \(\sqrt{T}\). The relative increase in
variance is \(r = (1 + 1/M) B / W\), the classical pooled df is
\(\nu_{\text{old}} = (M-1)(1 + 1/r)^2\), and the fraction of missing
information is
$$\mathrm{fmi} = (r + 2/(\nu + 3)) / (r + 1).$$
When df_fun returns finite complete-data df nu_com, the
Barnard-Rubin (1999) correction combines
\(\nu_{\text{obs}} = ((\nu_{\text{com}}+1)/(\nu_{\text{com}}+3))
\nu_{\text{com}} (1 - \lambda)\) with nu_old via
\(\nu_{\text{BR}} = 1/(1/\nu_{\text{old}} + 1/\nu_{\text{obs}})\).
MCMCglmm fits are rejected because Rubin's rules are not the right
tool for posterior samples: variance decomposition does not generalise
cleanly to posterior distributions. For a Bayesian pigauto workflow
(pigauto as imputer, MCMCglmm as inference engine), concatenate the
posterior samples across imputations manually – stack fit$Sol and
fit$VCV row-wise with do.call(rbind, ...) and wrap the result in
coda::as.mcmc(). For the frequentist Rubin's-rules path, see
vignette("mixed-types", package = "pigauto"). For an integrated
chained-equation MCMC workflow where imputation and inference happen
in a single engine, use the companion BACE package end-to-end
(BACE::bace() + BACE::pool_posteriors()).
References
Rubin DB (1987). Multiple Imputation for Nonresponse in Surveys. Wiley.
Barnard J, Rubin DB (1999). "Small-sample degrees of freedom with multiple imputation." Biometrika 86(4): 948-955.
Nakagawa S, Freckleton RP (2008). "Missing inaction: the dangers of ignoring missing data." Trends in Ecology & Evolution 23(11): 592-596.
Nakagawa S, Freckleton RP (2011). "Model averaging, missing data and multiple imputation: a case study for behavioural ecology." Behavioral Ecology and Sociobiology 65(1): 103-116.
Examples
if (FALSE) { # \dontrun{
# Typical workflow
mi <- multi_impute(traits, tree, m = 100)
fits <- with_imputations(mi, function(d) lm(y ~ x1 + x2, data = d))
pool_mi(fits)
# glmmTMB with custom extractors (conditional component only)
pool_mi(
fits,
coef_fun = function(f) glmmTMB::fixef(f)$cond,
vcov_fun = function(f) vcov(f)$cond
)
} # }