
Extract the implied trait covariance / correlation at one tier
Source:R/extract-sigma.R
extract_Sigma.RdImplements the decomposition
$$\boldsymbol\Sigma_\text{tier} \;=\; \underbrace{\boldsymbol\Lambda_\text{tier}\boldsymbol\Lambda_\text{tier}^\top}_{\text{shared (latent)}} \;+\; \underbrace{\mathbf S_\text{tier}}_{\text{unique (unique)}},$$
where \(\boldsymbol\Lambda\) comes from the latent() term at that tier
and \(\mathbf S\) comes from the corresponding unique() term. This is the
same decomposition the behavioural-syndromes / phenotypic-integration
literature uses (Bartholomew et al. 2011; Nakagawa et al. in prep),
equations 15, 22, and 30 of the methods paper.
Arguments
- fit
A
gllvmTMB_multifit.- level
One of
"unit"(between-unit),"unit_obs"(within-unit),"phy"(phylogenetic),"spatial", or"cluster". Legacy aliases"B","W", and"spde"are accepted with a soft-deprecation message.- part
One of
"total"(default),"shared","unique".- link_residual
For non-Gaussian fits.
"auto"(default) adds a per-trait link-specific implicit residual variance to the diagonal ofSigma, giving the marginal latent-scale interpretation; in mixed- family fits each trait gets the residual implied by its family/link (see "Family-aware link residuals" below for the full table)."none"returns the latent+unique-implied \(\boldsymbol\Lambda \boldsymbol\Lambda^\top + \mathbf S\) with no implicit residual added. For Gaussian or lognormal-only fits this argument is effectively a no-op (their implied \(\sigma^2_d = 0\)).- .skip_warn
Internal flag (default
FALSE). WhenTRUE, suppresses the once-per-session deprecation message that the internal.normalise_level()helper emits for legacylevelaliases ("B","W","spde","total"). Used by internal callers (e.g.extract_Omega()) that have already issued the deprecation message themselves; not part of the public API.
Value
For part = "total" or "shared": a list with components
Sigma (T x T matrix), R (T x T correlation matrix; only for
"total"), level, part, and note (character vector of
advisory messages, e.g. about a missing unique() term).
For part = "unique": a list with s (length-T named numeric
vector of unique variances), level, part, note.
Details
Why both latent() and unique() matter
If the formula has only latent(0 + trait | unit, d = K) and no
unique(0 + trait | unit), the engine can only fit the
\(\boldsymbol\Lambda \boldsymbol\Lambda^\top\) component – there is no
slot for trait-specific unique variance \(\mathbf S\). Calling
extract_Sigma(fit, level, part = "total") then returns just
\(\boldsymbol\Lambda \boldsymbol\Lambda^\top\), which understates the
diagonal of the true covariance. Any correlations computed from this
incomplete \(\hat{\boldsymbol\Sigma}\) are systematically inflated
(the same numerator with a too-small denominator).
For Gaussian / lognormal / Gamma fits this function emits a one-shot
message reminding the user to add + unique(0 + trait | unit) (or its
within-unit analogue) when this happens. Add the unique() term and the
decomposition is complete.
For non-Gaussian families (binomial, Poisson, Gamma) the latent-scale
residual variance has a closed-form approximation that should be added
to the diagonal of \(\boldsymbol\Sigma\) – see the link_residual
argument below.
The part argument
"total"(default)\(\boldsymbol\Sigma_\text{tier} = \boldsymbol\Lambda \boldsymbol\Lambda^\top + \mathbf S\) – the matrix users almost always want for reporting correlations.
"shared"\(\boldsymbol\Lambda \boldsymbol\Lambda^\top\) only – the reduced-rank, rotation-invariant component. Diagonals are \(\sum_\ell \Lambda_{t\ell}^2\); these are not the trait variances, they are the shared part of the trait variances.
"unique"\(\mathbf S_\text{tier}\) only – the trait-specific unique variances, returned as a length-
Tnamed numeric vector (the diagonal of \(\mathbf S\)).
Caveat: "shared" vs "unique" partition is only weakly identified
The total \(\boldsymbol\Sigma_\text{tier} = \boldsymbol\Lambda \boldsymbol\Lambda^\top + \mathbf S\)
is rotation-invariant and well-identified, so part = "total" is
well-identified. But the split between \(\boldsymbol\Lambda \boldsymbol\Lambda^\top\)
and \(\mathbf S\) is only weakly identified – different optimiser
starts can flow trait \(t\)'s variance more into the shared
("shared") or unique ("unique") component, with the same total
likelihood. In a synthetic Poisson + OLRE recovery run with target
\(\sigma^2_S = (0.5, 0.4, 0.3, 0.6)\), gllvmTMB returned
\((0, 0.67, 0, 0.57)\) – total trait variances correct, but the
partition arbitrary.
This is the standard rotation-and-shift indeterminacy of factor models
(gllvm and Hmsc have it too); communality \(c_t^2\) computed from
a single fit therefore inherits the same indeterminacy. For
interpretation, lean on part = "total" for correlations and use
extract_communality() only when paired with bootstrap_Sigma() or
the variance-decomposition consistency checks in
gllvmTMB_diagnose().
Family-aware link residuals
For non-Gaussian responses each row carries an implicit observation-level residual on the latent (link) scale. Adding it to the per-trait diagonal of \(\boldsymbol\Sigma\) is what makes cross-family Sigma comparable on the same (latent) scale: without this adjustment, binomial diagonals are too small relative to a Gaussian latent of comparable variance and the implied cross-family correlations are inflated.
Per-family formulas (Nakagawa & Schielzeth 2010; Nakagawa, Johnson & Schielzeth 2017):
gaussian (identity) | \(\sigma^2_d = 0\) |
binomial(link = "logit") | \(\sigma^2_d = \pi^2/3 \approx 3.290\) |
binomial(link = "probit") | \(\sigma^2_d = 1\) |
binomial(link = "cloglog") | \(\sigma^2_d = \pi^2/6 \approx 1.645\) |
poisson(link = "log") | \(\sigma^2_d = \log(1 + 1/\hat\mu_t)\) (lognormal-Poisson approx.) |
lognormal(link = "log") | \(\sigma^2_d = 0\) (sigma_eps already models the log-scale residual) |
Gamma(link = "log") | \(\sigma^2_d = \psi'(\hat\nu)\) where \(\hat\nu = 1/\hat\sigma_\varepsilon^2\) is the shape |
nbinom2(link = "log") | \(\sigma^2_d = \psi'(\hat\phi)\) where \(\hat\phi\) is the per-trait NB2 dispersion |
tweedie(link = "log") | \(\sigma^2_d = \log(1 + \hat\phi \hat\mu_t^{\hat p - 2})\) (delta method) |
Beta(link = "logit") | \(\sigma^2_d = \psi'(\hat\mu_t \hat\phi) + \psi'((1 - \hat\mu_t)\hat\phi)\) (Smithson & Verkuilen 2006) |
betabinomial(link = "logit") | \(\sigma^2_d = \pi^2/3 + \psi'(\hat\mu_t \hat\phi) + \psi'((1 - \hat\mu_t)\hat\phi)\) |
For mixed-family fits the residual is computed per trait from the
family of the rows belonging to that trait, then added to the diagonal
of \(\boldsymbol\Sigma\) entry-by-entry. The default
link_residual = "auto" applies this; "none" returns the latent+unique-implied
\(\boldsymbol\Lambda \boldsymbol\Lambda^\top + \mathbf S\) with no
implicit residual added. For continuous-only Gaussian or lognormal fits
"auto" is a no-op (their per-trait \(\sigma^2_d\) is zero).
Citations: Nakagawa & Schielzeth (2010); Nakagawa, Johnson & Schielzeth (2017) — see References below.
Future: 3+ latent tiers
The engine currently supports two latent() tiers ("unit" and
"unit_obs"); legacy level = "B" and "W" aliases are still accepted
with a soft-deprecation message.
level = "phy" extracts the phylogenetic implied
\(\boldsymbol\Sigma_\text{phy}\) from phylo_latent(). If a future
release adds 3+ latent tiers, level = "<colname>" will dispatch to the
corresponding tier without API change. For now, custom strings error
with a clear roadmap message.
References
Nakagawa, S. & Schielzeth, H. (2010). Repeatability for Gaussian and non-Gaussian data: a practical guide for biologists. Biological Reviews 85, 935-956. doi:10.1111/j.1469-185X.2010.00141.x
Nakagawa, S., Johnson, P. C. D., & Schielzeth, H. (2017). The coefficient of determination \(R^2\) and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. Journal of the Royal Society Interface 14(134), 20170213. doi:10.1098/rsif.2017.0213
See also
extract_communality() for the per-trait shared / unique
variance share at one tier; extract_proportions() for the
canonical per-trait variance decomposition across tiers;
extract_Omega() for the multi-tier sum.
Examples
if (FALSE) { # \dontrun{
fit <- gllvmTMB(
value ~ 0 + trait +
latent(0 + trait | unit, d = 2) + unique(0 + trait | unit),
data = df
)
extract_Sigma(fit, level = "unit", part = "total")$Sigma # full T x T cov
extract_Sigma(fit, level = "unit", part = "shared")$Sigma # rr-only
extract_Sigma(fit, level = "unit", part = "unique")$s # diag(s_unit)
} # }