Extract the implied trait covariance / correlation at one tier

Implements the decomposition $$\boldsymbol\Sigma_\text{tier} \;=\; \underbrace{\boldsymbol\Lambda_\text{tier}\boldsymbol\Lambda_\text{tier}^\top}_{\text{shared (latent)}} \;+\; \underbrace{\mathbf S_\text{tier}}_{\text{unique (unique)}},$$ where $\boldsymbol\Lambda$ comes from the latent() term at that tier and $\mathbf S$ comes from the corresponding unique() term. This is the same decomposition the behavioural-syndromes / phenotypic-integration literature uses (Bartholomew et al. 2011; Nakagawa et al. in prep), equations 15, 22, and 30 of the methods paper.

Usage

extract_Sigma(
  fit,
  level = c("unit", "unit_obs", "phy", "spatial", "cluster", "B", "W", "spde"),
  part = c("total", "shared", "unique"),
  link_residual = c("auto", "none"),
  .skip_warn = FALSE
)

Arguments

fit: A gllvmTMB_multi fit.
level: One of "unit" (between-unit), "unit_obs" (within-unit), "phy" (phylogenetic), "spatial", or "cluster". Legacy aliases "B", "W", and "spde" are accepted with a soft-deprecation message.
part: One of "total" (default), "shared", "unique".
link_residual: For non-Gaussian fits. "auto" (default) adds a per-trait link-specific implicit residual variance to the diagonal of Sigma, giving the marginal latent-scale interpretation; in mixed- family fits each trait gets the residual implied by its family/link (see "Family-aware link residuals" below for the full table). "none" returns the latent+unique-implied $\boldsymbol\Lambda \boldsymbol\Lambda^\top + \mathbf S$ with no implicit residual added. For Gaussian or lognormal-only fits this argument is effectively a no-op (their implied $\sigma^2_d = 0$).
.skip_warn: Internal flag (default FALSE). When TRUE, suppresses the once-per-session deprecation message that the internal .normalise_level() helper emits for legacy level aliases ("B", "W", "spde", "total"). Used by internal callers (e.g. extract_Omega()) that have already issued the deprecation message themselves; not part of the public API.

Value

For part = "total" or "shared": a list with components Sigma (T x T matrix), R (T x T correlation matrix; only for "total"), level, part, and note (character vector of advisory messages, e.g. about a missing unique() term).

For part = "unique": a list with s (length-T named numeric vector of unique variances), level, part, note.

Details

Why both `latent()` and `unique()` matter

If the formula has only latent(0 + trait | unit, d = K) and no unique(0 + trait | unit), the engine can only fit the $\boldsymbol\Lambda \boldsymbol\Lambda^\top$ component – there is no slot for trait-specific unique variance $\mathbf S$. Calling extract_Sigma(fit, level, part = "total") then returns just $\boldsymbol\Lambda \boldsymbol\Lambda^\top$, which understates the diagonal of the true covariance. Any correlations computed from this incomplete $\hat{\boldsymbol\Sigma}$ are systematically inflated (the same numerator with a too-small denominator).

For Gaussian / lognormal / Gamma fits this function emits a one-shot message reminding the user to add + unique(0 + trait | unit) (or its within-unit analogue) when this happens. Add the unique() term and the decomposition is complete.

For non-Gaussian families (binomial, Poisson, Gamma) the latent-scale residual variance has a closed-form approximation that should be added to the diagonal of $\boldsymbol\Sigma$ – see the link_residual argument below.

The `part` argument

"total" (default): $\boldsymbol\Sigma_\text{tier} = \boldsymbol\Lambda \boldsymbol\Lambda^\top + \mathbf S$ – the matrix users almost always want for reporting correlations.
"shared": $\boldsymbol\Lambda \boldsymbol\Lambda^\top$ only – the reduced-rank, rotation-invariant component. Diagonals are $\sum_\ell \Lambda_{t\ell}^2$; these are not the trait variances, they are the shared part of the trait variances.
"unique": $\mathbf S_\text{tier}$ only – the trait-specific unique variances, returned as a length-T named numeric vector (the diagonal of $\mathbf S$).

Caveat: `"shared"` vs `"unique"` partition is only weakly identified

The total $\boldsymbol\Sigma_\text{tier} = \boldsymbol\Lambda \boldsymbol\Lambda^\top + \mathbf S$ is rotation-invariant and well-identified, so part = "total" is well-identified. But the split between $\boldsymbol\Lambda \boldsymbol\Lambda^\top$ and $\mathbf S$ is only weakly identified – different optimiser starts can flow trait $t$'s variance more into the shared ("shared") or unique ("unique") component, with the same total likelihood. In a synthetic Poisson + OLRE recovery run with target $\sigma^2_S = (0.5, 0.4, 0.3, 0.6)$, gllvmTMB returned $(0, 0.67, 0, 0.57)$ – total trait variances correct, but the partition arbitrary.

This is the standard rotation-and-shift indeterminacy of factor models (gllvm and Hmsc have it too); communality $c_t^2$ computed from a single fit therefore inherits the same indeterminacy. For interpretation, lean on part = "total" for correlations and use extract_communality() only when paired with bootstrap_Sigma() or the variance-decomposition consistency checks in gllvmTMB_diagnose().

Family-aware link residuals

For non-Gaussian responses each row carries an implicit observation-level residual on the latent (link) scale. Adding it to the per-trait diagonal of $\boldsymbol\Sigma$ is what makes cross-family Sigma comparable on the same (latent) scale: without this adjustment, binomial diagonals are too small relative to a Gaussian latent of comparable variance and the implied cross-family correlations are inflated.

Per-family formulas (Nakagawa & Schielzeth 2010; Nakagawa, Johnson & Schielzeth 2017):

`gaussian` (identity)	$\sigma^2_d = 0$
`binomial(link = "logit")`	$\sigma^2_d = \pi^2/3 \approx 3.290$
`binomial(link = "probit")`	$\sigma^2_d = 1$
`binomial(link = "cloglog")`	$\sigma^2_d = \pi^2/6 \approx 1.645$
`poisson(link = "log")`	$\sigma^2_d = \log(1 + 1/\hat\mu_t)$ (lognormal-Poisson approx.)
`lognormal(link = "log")`	$\sigma^2_d = 0$ (sigma_eps already models the log-scale residual)
`Gamma(link = "log")`	$\sigma^2_d = \psi'(\hat\nu)$ where $\hat\nu = 1/\hat\sigma_\varepsilon^2$ is the shape
`nbinom2(link = "log")`	$\sigma^2_d = \psi'(\hat\phi)$ where $\hat\phi$ is the per-trait NB2 dispersion
`tweedie(link = "log")`	$\sigma^2_d = \log(1 + \hat\phi \hat\mu_t^{\hat p - 2})$ (delta method)
`Beta(link = "logit")`	$\sigma^2_d = \psi'(\hat\mu_t \hat\phi) + \psi'((1 - \hat\mu_t)\hat\phi)$ (Smithson & Verkuilen 2006)
`betabinomial(link = "logit")`	$\sigma^2_d = \pi^2/3 + \psi'(\hat\mu_t \hat\phi) + \psi'((1 - \hat\mu_t)\hat\phi)$

For mixed-family fits the residual is computed per trait from the family of the rows belonging to that trait, then added to the diagonal of $\boldsymbol\Sigma$ entry-by-entry. The default link_residual = "auto" applies this; "none" returns the latent+unique-implied $\boldsymbol\Lambda \boldsymbol\Lambda^\top + \mathbf S$ with no implicit residual added. For continuous-only Gaussian or lognormal fits "auto" is a no-op (their per-trait $\sigma^2_d$ is zero).

Citations: Nakagawa & Schielzeth (2010); Nakagawa, Johnson & Schielzeth (2017) — see References below.

Future: 3+ latent tiers

The engine currently supports two latent() tiers ("unit" and "unit_obs"); legacy level = "B" and "W" aliases are still accepted with a soft-deprecation message. level = "phy" extracts the phylogenetic implied $\boldsymbol\Sigma_\text{phy}$ from phylo_latent(). If a future release adds 3+ latent tiers, level = "<colname>" will dispatch to the corresponding tier without API change. For now, custom strings error with a clear roadmap message.

References

Nakagawa, S. & Schielzeth, H. (2010). Repeatability for Gaussian and non-Gaussian data: a practical guide for biologists. Biological Reviews 85, 935-956. doi:10.1111/j.1469-185X.2010.00141.x

Nakagawa, S., Johnson, P. C. D., & Schielzeth, H. (2017). The coefficient of determination $R^2$ and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. Journal of the Royal Society Interface 14(134), 20170213. doi:10.1098/rsif.2017.0213

Examples

if (FALSE) { # \dontrun{
fit <- gllvmTMB(
  value ~ 0 + trait +
          latent(0 + trait | unit, d = 2) + unique(0 + trait | unit),
  data = df
)
extract_Sigma(fit, level = "unit", part = "total")$Sigma   # full T x T cov
extract_Sigma(fit, level = "unit", part = "shared")$Sigma  # rr-only
extract_Sigma(fit, level = "unit", part = "unique")$s      # diag(s_unit)
} # }

`gaussian` (identity)	\(\sigma^2_d = 0\)
`binomial(link = "logit")`	\(\sigma^2_d = \pi^2/3 \approx 3.290\)
`binomial(link = "probit")`	\(\sigma^2_d = 1\)
`binomial(link = "cloglog")`	\(\sigma^2_d = \pi^2/6 \approx 1.645\)
`poisson(link = "log")`	\(\sigma^2_d = \log(1 + 1/\hat\mu_t)\) (lognormal-Poisson approx.)
`lognormal(link = "log")`	\(\sigma^2_d = 0\) (sigma_eps already models the log-scale residual)
`Gamma(link = "log")`	\(\sigma^2_d = \psi'(\hat\nu)\) where \(\hat\nu = 1/\hat\sigma_\varepsilon^2\) is the shape
`nbinom2(link = "log")`	\(\sigma^2_d = \psi'(\hat\phi)\) where \(\hat\phi\) is the per-trait NB2 dispersion
`tweedie(link = "log")`	\(\sigma^2_d = \log(1 + \hat\phi \hat\mu_t^{\hat p - 2})\) (delta method)
`Beta(link = "logit")`	\(\sigma^2_d = \psi'(\hat\mu_t \hat\phi) + \psi'((1 - \hat\mu_t)\hat\phi)\) (Smithson & Verkuilen 2006)
`betabinomial(link = "logit")`	\(\sigma^2_d = \pi^2/3 + \psi'(\hat\mu_t \hat\phi) + \psi'((1 - \hat\mu_t)\hat\phi)\)