Extract cross-trait correlations

Returns the implied cross-trait correlations from a fit returned by gllvmTMB() at one or more covariance levels. Use the canonical input names "unit", "unit_obs", "phy", and "spatial"; legacy aliases "B", "W", and "spde" still work. The helper returns 95% (or other-level) point estimates by default. Fisher-z and bootstrap intervals can be requested explicitly with the method argument; the former profile token is retained only to return a clear withdrawal error:

Usage

extract_correlations(
  fit,
  tier = "all",
  pair = NULL,
  level = 0.95,
  method = c("none", "fisher-z", "profile", "wald", "bootstrap"),
  n_eff = NULL,
  nsim = 500L,
  seed = NULL,
  link_residual = c("auto", "none")
)

Arguments

fit

A fit returned by gllvmTMB. Julia bridge (engine = "julia") fits expose the ordinary unit tier only and carry no correlation-interval payload, so they return point-only rows: the same schema with lower and upper set to NA, method = "none", and interval_status = "none". Use engine = "tmb" when you need correlation confidence intervals.

tier

Character vector. Use "all" (the default) to request every level present in the fit. Canonical inputs are "unit", "unit_obs", "phy", and "spatial"; legacy aliases "B", "W", and "spde" are accepted.

pair

Optional length-2 character or integer vector specifying one trait pair (c("trait_1", "trait_2") or c(1, 2)). When supplied, only that pair is returned for each requested tier. Default NULL (all pairs).

level

Confidence level in (0, 1). Default 0.95.

method

One of "none" (default), "fisher-z", "wald" (alias of "fisher-z"), or "bootstrap". The accepted "profile" token is withdrawn and stops with an explanation. See Details.

n_eff

Optional positive integer (>= 4): override the effective sample size used in Fisher's $\widehat{\mathrm{SE}}(\hat z) = 1/\sqrt{n_{\text{eff}} - 3}$ formula. This is a user-supplied sensitivity parameter, not an estimated effective sample size. It is consulted for method %in% c("fisher-z", "wald") and for explicit Wald fallback rows when a requested bootstrap tier is not yet routed. The default heuristic uses fit$n_sites for tier "B" ("unit"), fit$n_site_species for "W" ("unit_obs"), fit$n_species for "phy", and fit$n_sites elsewhere. These counts are only transparent heuristics: grouped, phylogenetic, spatial, latent, and non-Gaussian structure can make the iid Fisher variance inappropriate. If the automatic tier count is missing or below 4, Fisher-z rows return point correlations with unavailable interval bounds rather than substituting an arbitrary sample size. Set n_eff only when the analysis supplies a defensible target-specific rationale, and report that rationale.

nsim

Number of bootstrap replicates when method = "bootstrap". Default 500.

seed

Optional RNG seed for the bootstrap.

link_residual

How to treat the family-specific link-residual variance on the diagonal of the implied $\boldsymbol\Sigma$ before computing correlations:

"auto" (default): For non-Gaussian fits, add the link-specific implicit residual (e.g. $\pi^2/3$ for binomial-logit; $1$ for probit; trigamma() terms for Gamma / NB2 / Beta / etc.) to the diagonal before computing correlations. Returned correlations are on the latent-liability scale; this is the convention most readers expect. Gaussian fits are unaffected (link residual is $0$).
"none": Use the fitted model-implied $\Sigma$ directly with no link-residual addition. For ordinary latent() fits this includes the default diagonal Psi companion; correlations come out on the model-implied scale without the family adjustment.

Behaviour change in this release: the previous version hardcoded link_residual = "none". Non-Gaussian callers who relied on that behaviour will see different correlation values under the new default. A one-shot warning fires the first time per session that a non-Gaussian fit is processed without an explicit link_residual argument. Pass link_residual = "auto" to suppress the warning and lock the new behaviour, or link_residual = "none" to restore the previous behaviour.

Value

A data frame (tibble-like) with columns:

tier: Character level label. The current output stores internal labels "B", "W", "phy", and "spde"; use "unit" and "unit_obs" as input names in new calls.
trait_i, trait_j: Trait names with i < j.
correlation: Point estimate.
lower, upper: Confidence-interval bounds.
method: Method used to compute the CI.
interval_status: Claim-boundary marker: "none" for point-only output, "heuristic_unvalidated" for Fisher-z/Wald bounds, and "target_specific_uncalibrated" for bootstrap bounds. The bootstrap route computes intervals, but this function does not certify their frequentist coverage for the fitted target.

For an engine = "julia" bridge fit, lower/upper are NA, method = "none", and interval_status = "none".

Details

"none" (default): point correlations only. No universal interval calibration has been established across covariance tiers, families, and targets.
"fisher-z": Fisher's z-transform heuristic interval. Computes $\hat z = \mathrm{atanh}(\hat\rho)$, $\widehat{\mathrm{SE}}(\hat z) = 1/\sqrt{n_{\text{eff}} - 3}$, constructs the CI on z, then back-transforms via $\tanh(\cdot)$ (so bounds are guaranteed inside $[-1, 1]$). Fast (seconds for any T), but the classical iid-correlation variance is not a calibrated mixed-model standard error. Treat these bounds as a sensitivity display and see n_eff.
"profile": withdrawn. The nonlinear penalty-profile prototype did not yet provide a sufficiently strict constraint and constrained-optimizer diagnostic contract. This token stops with an explanation rather than returning bounds.
"wald": backward-compat alias of "fisher-z" with the same numerics. Emits a one-shot inform pointing at the canonical name. Kept for scripts that filter on method == "wald".
"bootstrap": parametric bootstrap via bootstrap_Sigma. Slowest (full sampling distribution); use when a fitted model gives useful point estimates but Hessian- or profile-based intervals are not the right uncertainty summary. Structured tiers not yet resampled by bootstrap_Sigma, currently the SPDE spatial tier, return an explicit Wald/Fisher-z fallback with a message rather than fake bootstrap support.

For T traits at one tier, there are T(T-1)/2 unique correlations. A fit with T = 6 and four covariance levels present has up to 60 cross-trait correlations to report.

Caveats

The former nonlinear penalty-profile route is withheld pending an exact constraint solver and explicit constrained-fit diagnostics.
Bootstrap uses bootstrap_Sigma refits and is the practical fallback when point estimates are useful but Hessian- or profile-based intervals are unavailable. Inspect bootstrap warnings, failed replicates, and interval width before treating the intervals as final.

Examples

if (FALSE) { # \dontrun{
set.seed(1)
s <- simulate_site_trait(
  n_sites = 80, n_species = 6, n_traits = 4,
  mean_species_per_site = 4,
  Lambda_B = matrix(c(0.9, 0.4, -0.3, 0.5), 4, 1),
  psi_B = c(0.4, 0.3, 0.5, 0.2)
)
fit <- gllvmTMB(
  value ~ 0 + trait + latent(0 + trait | site, d = 1),
  data  = s$data,
  trait = "trait",
  unit  = "site"
)
## Default: point correlations only.
cors <- extract_correlations(fit, tier = "unit")
## Opt-in Fisher-z sensitivity bounds with an explicitly justified n_eff.
cors2 <- extract_correlations(fit, tier = "unit", method = "fisher-z",
                               n_eff = 60L)
## Bootstrap (B = 200).
cors_b <- extract_correlations(fit, tier = "unit", method = "bootstrap",
                               nsim = 200, seed = 42)
} # }