Top-level entry point for the multivariate site–trait model of Nakagawa et al. (in prep) functional biogeography (and the related behavioural-syndromes / phylogenetic / systematic-mapping reduced-rank models). Long-format input with one row per (unit, trait) observation; glmmTMB-style formula syntax for fixed effects, plus a 3 x 5 grid of covariance-structure keywords organised by correlation (none / phylo / spatial) and mode (scalar / unique / indep / dep / latent):
Usage
gllvmTMB(
formula,
data,
trait = "trait",
unit = "site",
unit_obs = "site_species",
cluster = "species",
family = gaussian(),
weights = NULL,
mesh = NULL,
phylo_vcv = NULL,
phylo_tree = NULL,
known_V = NULL,
lambda_constraint = NULL,
control = gllvmTMBcontrol(),
silent = TRUE,
site = NULL,
species = NULL
)Arguments
- formula
A glmmTMB-style formula, e.g.
value ~ 0 + trait + (0 + trait):env_temp + (0 + trait):env_precip. Fixed effects and any of the 3 x 5 keyword-grid covstructs above are supported (plusphylo_slope()andmeta_known_V()).- data
A long-format data frame with one row per (unit, trait) observation. Must include the response (LHS of
formula), the columns named inunit,unit_obs,cluster,trait, and any predictors referenced in the formula.- trait
Name of the column holding the trait factor (the "trait" dimension of the unit × trait response matrix). Default
"trait".- unit
Name of the column holding the between-unit grouping factor (the "unit" dimension of the unit × trait response matrix). Examples:
"site"for site × species data,"individual"for behavioural-syndrome data,"species"for PGLLVM,"paper"for systematic mapping. Default"site".- unit_obs
Name of the column holding the within-unit grouping factor — one level per (unit, replicate) cell — used by
latent(0 + trait | unit_obs, ...)andunique(0 + trait | unit_obs)for the W-tier covariance. Default"site_species"(the conventional name in joint species distribution modelling; safe for site × species data). For other domains pass e.g.unit_obs = "obs"for behavioural syndromes.- cluster
Name of the column holding the third grouping factor (the "cluster" slot). When the column name matches the row/column names of
phylo_vcv(or the tip labels ofphylo_tree), this slot also drives the phylogenetic random effects (phylo_latent,phylo_scalar,phylo_unique,phylo_slope). When the column does not match a phylogenetic correlation, the slot still functions as a regular crossed/nested third grouping (e.g.cluster = "population"for 3-level personality data,cluster = "study"for multi-study meta-analysis). Default"species". For datasets with no third grouping the column may be absent fromdata; the engine will synthesise a placeholder.The engine does not enforce nesting between
unit_obs,unit, andcluster— crossed and nested designs both fit. Two canonical patterns:Functional biogeography (crossed):
unit = "site",unit_obs = "site_species",cluster = "species". Sites and species are crossed;site_speciesindexes the intersection cell.3-level personality (strictly nested):
unit = "individual",unit_obs = "session_id",cluster = "population".
- family
A
familyobject. The multivariate engine (formulas withlatent(),unique(), etc.) supportsgaussian(),binomial()(logit / probit / cloglog),poisson()(log link),ordinal_probit()(the gllvmTMB-native ordinal threshold family with sigma_d = 1 fixed exactly; no delta-method approximation),lognormal()(log link),Gamma(link = "log"),nbinom2()(NB2 negative binomial; log link),tweedie()(compound Poisson-Gamma; log link),Beta()(logit link, mean-precision parameterisation; y in (0, 1)),betabinomial()(logit link, mean-precision parameterisation; k-of-n data),student()(heavy-tailed continuous; identity link),truncated_poisson()(zero-truncated count; log link),truncated_nbinom2()(zero-truncated NB2; log link), and the hurdle / two-part familiesdelta_lognormal()/delta_gamma()(Bernoulli for presence + Lognormal/Gamma for the positive component; share one linear predictor under the current implementation – see Details). For multi-trial binomial or beta-binomial (k-of-n) data, write the LHS offormulaascbind(successes, failures)(the canonical R /glm()/glmmTMBconvention); the engine then uses the per-row trial countsuccesses + failuresas the binomial size. Aweights =numeric vector of trial counts is also accepted as an alternative API (see Details). Thenbinom2()family fits one log-dispersion parameter per trait (Var = mu + mu^2 / phi). Thetweedie()family fits one log-phi and one logit-(p-1) per trait, with p in (1, 2). TheBeta()andbetabinomial()families each fit one log-precision per trait (a = mu*phi, b = (1-mu)*phi; Smithson & Verkuilen 2006). Thestudent()family fits one log-sigma and one log(df-1) per trait (df > 1; passstudent(df = 3)to fix df at a known value). Thetruncated_*()families require strictly positive integer responses (y >= 1). The delta families fit one log-dispersion of the positive component per trait (sigma_lognormal or phi_gamma).- weights
Optional numeric vector of length
nrow(data). Family- conditional semantics (lme4 / glmmTMB convention):For non-binomial families,
weightsis interpreted as per- observation likelihood multipliers — each row's log-likelihood contribution is multiplied byweights[i]. This matches theweightsargument oflme4::lmer(),glmmTMB::glmmTMB()andstats::glm(). Use this for relative-abundance weighting (sqrt(p_si)), inverse-variance weighting (1 / var_ifor heteroscedastic Gaussian fits), or to down-weight outliers. Must be non-negative and finite;weights[i] = 0zeroes that row's contribution to the joint NLL.For binomial families fit without a
cbind(succ, fail)LHS,weightscontinues to be interpreted as the per-row trial count (binomial size; alternative API tocbind(succ, fail)). This pre-existing semantics is preserved on a per-row basis: in mixed- family fits, binomial rows useweights[i]as their trial count and non-binomial rows use it as the likelihood multiplier. DefaultNULLis equivalent to a length-nrow(data)vector of ones (unweighted Bernoulli on binomial rows). Wide matrix calls throughgllvmTMB_wide()and wide data-frame calls throughtraits()normalise their accepted weight shapes to this same long-format vector before fitting.
- mesh
Optional mesh object from
make_mesh(). Required when the formula includes aspatial()term; ignored otherwise.- phylo_vcv
(legacy global) Optional tip-only
n_species × n_speciesphylogenetic correlation matrix. The canonical syntax isvcv =inside eachphylo_*()keyword.— prefer
tree =. The dense path inverts viaMatrix::solve(), giving the same MLE as the sparse path but atO(n^2)memory andO(n^3)Cholesky cost. Use only when you have a Cphy in hand and no Newick tree (e.g. comparing againstnlme::corPagel). Seedev/design/01-phylo-api-canonical-tree.mdfor the full technical justification.- phylo_tree
(legacy global) Optional
ape::phylotree. The canonical Phase L (May 2026) syntax is to passtree =inside eachphylo_*()keyword (e.g.phylo_latent(species, d = K, tree = tree)); this argument is the older outer-level fallback and is slated for soft deprecation. When supplied (in either form) thephylo_*()terms build a sparse \(\mathbf{A}^{-1}\) over tips + internal nodes viaMCMCglmm::inverseA(tree)(Hadfield & Nakagawa 2010, appendix eqs. 26-29). The result is~5nnon-zeros for ann-tip tree, vsn^2for the dense path. This is the recommended path at anyn_species; the speedup grows to ~24× atn_species = 1000. Requires theMCMCglmmpackage.- known_V
Optional list of block-diagonal sampling-error matrices
V_t. Used by theequalto()two-stage workflow when sampling-error matrices are available from a prior stage of estimation.- lambda_constraint
Optional list with elements
Band / orW, each ann_traits × dmatrix of confirmatory loading constraints (galamm-style).NAentries are estimated; numerical entries are pinned. Upper-triangle entries are silently ignored — the engine's lower-triangular parameterisation already fixes those at zero. DefaultNULLuses the engine's exploratory lower-triangular convention.- control
Output of
gllvmTMBcontrol().- silent
Logical; suppress TMB and gllvmTMB chatter. Default
TRUE.- site
(deprecated) alias for
unit. Kept for backward compatibility. Useunit = ...in new code.- species
(deprecated) alias for
cluster. Kept for backward compatibility. Usecluster = ...in new code.
Value
A gllvmTMB object. With no covariance-structure terms in
the formula the result has class "gllvmTMB" (single-response
engine); with latent() or other covstruct sugar it has class
c("gllvmTMB_multi", "gllvmTMB") (multi-trait engine).
Either way, S3 methods such as tidy(), predict(),
vcov(), logLik() etc. dispatch on gllvmTMB; gllvmTMB_multi-
specific methods (e.g. trait-level extract_ICC_site(),
extract_communality()) are available for multi-trait fits.
Details
| correlation \ mode | scalar | unique | indep | dep | latent |
| none | (omit) | unique() | indep() | dep() | latent() |
| phylo | phylo_scalar() | phylo_unique() | phylo_indep() | phylo_dep() | phylo_latent() |
| spatial | spatial_scalar() | spatial_unique() | spatial_indep() | spatial_dep() | spatial_latent() |
The four "quartet" modes (unique / indep / dep / latent)
encode covstruct intent under a strict always-paired-vs-always-alone
convention:
latent + unique(paired) — the decomposition mode: \(\boldsymbol\Sigma = \boldsymbol\Lambda \boldsymbol\Lambda^\top + \mathbf S\).uniquestandalone is also legitimate (e.g. observation-level random effects), but the canonical pattern is paired withlatent.indep(alone) — the marginal-only mode: each trait gets its own variance, no cross-trait covariance.dep(alone) — the full unstructured mode: \(\boldsymbol\Sigma\) is free with \(T(T+1)/2\) parameters via a Cholesky factor.
Plus the supporting phylo_slope() (random slope), meta_known_V()
(known-V meta-analytic), and the engine-internal propto() /
equalto() covstructs (used by the canonical keywords above; not
typically called directly). See vignette("api-keyword-grid") for a
one-paragraph tour of each cell.
gllvmTMB() parses the glmmTMB-style formula, checks the long-format
data, and dispatches to the underlying TMB template (built on sdmTMB's
SPDE machinery). Covariance-structure terms (latent(), unique(),
propto(), equalto(), spatial()) are processed by extending the
formula parser and the TMB template.
Per the manuscript, when stacking traits one should set
dispformula = ~ 0 so that no implicit residual variance competes
with structured unique(0 + trait | …) terms. gllvmTMB() enforces
this internally.
Multi-trial binomial. The TMB engine evaluates dbinom(y, n_trials, p)
so binomial fits are not restricted to Bernoulli (size = 1). To pass a
per-row trial count, use either:
cbind(successes, failures) ~ ...on the LHS of the formula (canonical R /glmmTMBconvention; recommended), ora flat response
ywithweights = n_trials(the alternative glmmTMB API).
Both interfaces produce identical fits. For Bernoulli data, omit
weights and use a 0/1 response; the engine sets n_trials = 1 and
the likelihood is identical to the previous Bernoulli-only behaviour.
Delta (hurdle) families. delta_lognormal() and delta_gamma()
use a single linear predictor for both components: presence is
\(\Pr(y > 0) = \mathrm{invlogit}(\eta)\) and the positive-component
mean is \(E[y \mid y > 0] = \exp(\eta)\). This matches sdmTMB's
standard delta default (one fixed-effects matrix, one set of trait
random effects). The shared-predictor scheme is the simplest hurdle
formulation; a future release may decouple the two predictors so
presence and abundance can have independent fixed and random effects.
Only type = "standard" (the default) is currently wired in the
multivariate engine; type = "poisson-link" is on the roadmap. Each
delta family carries one per-trait dispersion of the positive
component (no extra Bernoulli dispersion). The response must be
non-negative.
Per-trait residual variance: when does it activate?
"Residual" in a mixed-effects model is scale-relative: what
counts as residual variance shifts as you add levels to the model.
In a no-unique Gaussian fit, the residual is row-level noise
captured by a single shared sigma_eps. Once you add a per-row
unique(0 + trait | obs) term, the row-level residual is now T
per-trait random-effect variances and sigma_eps is auto-suppressed
to avoid double-counting. Once you also add unique(0 + trait | site)
on top, the row-level term remains the residual and the site-level
term is now an additional, higher-level random effect — not a
residual at all. The dispatch table below records this explicitly
for the configurations the engine supports today.
For continuous-family responses (Gaussian, lognormal, Gamma) the
engine has one residual scale parameter, sigma_eps. By default
sigma_eps is estimated as a single shared scalar across all
continuous-family rows — per-trait residual variances only appear
if you explicitly add a per-row unique(...) (or, when Phase B
lands, indep(...)) term. The dispatch is automatic:
- No
unique/indep, continuous families present One shared
sigma_epsacross all rows. Not per-trait.- No
unique/indep, no continuous families sigma_epsis mapped off; the family's intrinsic dispersion handles the residual.unique(0 + trait | g)whereghas fewer levels than rows (e.g.g = "site")sigma_epsis still estimated as the row-level residual; theuniqueterm adds a per-trait random effect at levelgon top.unique(0 + trait | obs)at the per-row level (one level per row), continuous families fittedsigma_epsis auto-suppressed (mapped off, fixed at a tiny stabiliser); the T per-traituniquerandom effects are the residual. A one-shotcli::cli_informfires at fit time announcing the auto-suppression.unique(0 + trait | obs)at the per-row level, non-Gaussian or mixed-family fitTreated as observation-level random effects (OLRE). For Bernoulli traits the OLRE is statistically unidentifiable and is mapped off; for hurdle / delta families a warning is emitted (see "Per-family-aware OLRE selection" below).
indep(0 + trait | obs)at the per-row level (Phase B, not yet implemented)Mathematically equivalent to the per-row
uniquecase; same auto-suppression dispatch.
Mnemonic: continuous-family sigma_eps is the default; per-row
unique / indep replaces it with T per-trait residuals; non-per-row
unique / indep adds a higher-level random effect on top of
sigma_eps; non-continuous families never carry sigma_eps
regardless.
Per-family-aware OLRE selection. When unique(0 + trait | <unit_obs>)
is at per-row resolution (i.e. one row per (trait, unit_obs) cell),
the resulting per-trait random effects on the linear predictor are an
observation-level random effect (OLRE). The engine now decides per
trait what to do with the OLRE variance based on the trait's response
family:
single-trial Bernoulli (
binomial(), all rows haven_trials == 1):theta_diag_W[t]and the correspondings_Wcolumn are mapped off. The OLRE is statistically unidentifiable here (no within-cell replication) and the MLE is the trivial boundary \(\sigma_W \to 0\); pinning the parameter at \(\sigma_W \approx 10^{-6}\) removes the spurious free parameter and emits a one-shot informational message. Multi-trial binomial (viacbind(succ, fail)orweights = n_trials) is identifiable and fit normally.delta / hurdle families (
delta_lognormal(),delta_gamma()): OLRE is fit as before, but a one-shot warning is emitted because the OLRE acts on the shared linear predictor of the hurdle and mixes presence and positive-component noise.all other families: OLRE is fit as before.
In a mixed-family fit (per-row family list) the per-trait decision applies only to the affected trait. References: Nakagawa & Schielzeth (2010) Biol. Rev. 85: 935-956; Nakagawa, Johnson & Schielzeth (2017) J. R. Soc. Interface 14: 20170213.
See also
traits() for wide data-frame formula input;
gllvmTMB_wide() for wide matrix/data-frame input;
simulate_site_trait() for
generating recovery test data;
extract_Sigma() for the unified post-fit covariance API;
gllvmTMB_diagnose() for a one-stop convergence + identifiability
health check; ordinal_probit() for the gllvmTMB-native ordinal
threshold family.
Examples
if (FALSE) { # \dontrun{
set.seed(1)
sim <- simulate_site_trait(n_sites = 50, n_species = 8, n_traits = 3,
mean_species_per_site = 5)
fit <- gllvmTMB(
value ~ 0 + trait + (0 + trait):env_1 + (0 + trait):env_2,
data = sim$data,
family = gaussian()
)
summary(fit)
} # }
