Separate OLRE residual variance from the distribution-specific latent residual

For an additive overdispersion (OLRE) model $$\eta_{it} = \mathbf{X}\boldsymbol\beta + \ldots + e_{it}, \quad e_{it} \sim N(0, \sigma^2_e),$$ the total latent-scale residual variance for trait $t$ is $$\sigma^2_{d,t} + \sigma^2_{e,t},$$ where $\sigma^2_d$ is the distribution-specific (theoretical) component that depends only on the family/link, and $\sigma^2_e$ is the estimated OLRE variance — the per-trait diagonal of the within-unit unique covariance $\mathbf{S}_W$.

Usage

extract_residual_split(fit)

Arguments

fit: A gllvmTMB_multi fit.

Value

A data frame with one row per trait and columns:

trait: Factor of trait names.
sigma2_d: Theoretical / parameter-dependent distribution-specific latent residual (computed by the internal link_residual_per_trait() helper; see the per-family table above; zero for gaussian and lognormal).
sigma2_e: Estimated OLRE variance per trait — the per-trait diagonal of $\mathbf{S}_W$ when the fit has a genuine observation-level unique() term, else 0.
sigma2_total: sigma2_d + sigma2_e.

Details

The function detects whether the fit includes a genuine observation-level random effect: a unique(0 + trait | <obs-level>) term where every (trait, obs) cell is unique (i.e. one row per observation level per trait). When this cell-uniqueness condition holds, sigma2_e is populated; otherwise it is zero.

Terminology note

Nakagawa & Schielzeth (2010) use $\sigma^2_d$ for both components. Nakagawa, Johnson & Schielzeth (2017) §7 refine the terminology: $\sigma^2_d$ (distribution-specific) applies only to binomial-type families whose link function introduces a fixed latent-scale variance; $\sigma^2_\varepsilon$ (observation-level) applies to overdispersed Poisson / NB / Gamma and is estimated from the data. gllvmTMB keeps the colloquial sigma2_d column name for compatibility but documents the distinction here (NJS 2017 §7).

Per-family $\sigma^2_d$ table

Family	Link	$\sigma^2_d$
`gaussian`	identity	0
`binomial`	logit	$\pi^2/3 \approx 3.290$
`binomial`	probit	$1$
`binomial`	cloglog	$\pi^2/6 \approx 1.645$
`poisson`	log	$\log(1 + 1/\hat{\mu}_t)$ (lognormal-Poisson approx.)
`lognormal`	log	0
`Gamma`	log	$\psi_1(\hat\nu)$, $\hat\nu = 1/\hat\sigma_\varepsilon^2$
`Beta`	logit	$\psi_1(\hat\mu_t \hat\phi) + \psi_1((1 - \hat\mu_t)\hat\phi)$ (Smithson & Verkuilen 2006)
`betabinomial`	logit	$\pi^2/3 + \psi_1(\hat\mu_t \hat\phi) + \psi_1((1 - \hat\mu_t)\hat\phi)$

References

Nakagawa, S. & Schielzeth, H. (2010) Repeatability for Gaussian and non-Gaussian data: a practical guide for biologists. Biological Reviews 85(4): 935-956. doi:10.1111/j.1469-185X.2010.00141.x

Nakagawa, S., Johnson, P. C. D. & Schielzeth, H. (2017) The coefficient of determination $R^2$ and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. Journal of the Royal Society Interface 14(134): 20170213. doi:10.1098/rsif.2017.0213

Examples

if (FALSE) { # \dontrun{
## Add a site_species column (one level per row) as the obs-level grouping.
df$site_species <- factor(seq_len(nrow(df)))
fit <- gllvmTMB(
  value ~ 0 + trait + unique(0 + trait | site_species),
  data = df, family = poisson()
)
extract_residual_split(fit)
} # }