Skip to contents

Formula-LHS marker that lets gllvmTMB() accept a wide data frame (one row per individual, one column per trait) instead of the canonical long-format (unit, trait) data.

Usage

traits(...)

Arguments

...

Column-selection expression(s) passed verbatim to tidyr::pivot_longer(cols = ...). Bare names or any tidyselect verb (all_of(), starts_with(), matches(), etc.) are accepted.

Value

A formula marker; never evaluated as a function call. The parser recognises traits(...) on the LHS of a gllvmTMB() formula and dispatches to the wide-format pivot pre-pass.

Details

The package thinks in two shapes, long or wide:

  • long: gllvmTMB(value ~ ..., data = df_long, ...) – one row per (unit, trait) observation.

  • wide data frame: gllvmTMB(traits(t1, t2, ...) ~ ..., data = df_wide, ...) – one row per unit, one column per trait, with compact formula syntax.

  • wide matrix: gllvmTMB_wide(Y, ...) – a numeric matrix or data frame wrapper for matrix-first workflows.

All paths reach the same long-format engine; the user picks whichever shape matches their data on disk.

Because the LHS already names the response traits, the RHS can use a compact wide shorthand. 1 expands to the trait-specific intercepts 0 + trait; ordinary predictors such as env_temp expand to (0 + trait):env_temp; and latent(1 | individual) / unique(1 | individual) expand to the long covariance syntax latent(0 + trait | individual) / unique(0 + trait | individual). The same 1 | group shorthand is recognised for indep(), dep(), bar-style phylo_indep() / phylo_dep(), and the spatial_*() keywords. Species-axis phylogenetic keywords such as phylo_latent(species, d = K) already name their phylogenetic axis and pass through unchanged. Ordinary random-intercept terms such as (1 | batch) also pass through unchanged.

gllvmTMB(
  traits(sleep, mass, lifespan, brain) ~ 1 + env_temp +
    latent(1 | individual, d = 2) +
    unique(1 | individual),
  data   = wide_df,
  unit   = "individual",
  family = gaussian()
)

Internally traits() is implemented as a tidyr::pivot_longer() pre-pass: the wide data is pivoted to long format with trait as a factor column (levels in the order the user supplied to traits()) and .y_wide_ as the response column; the LHS of the formula is rewritten from traits(...) to .y_wide_; and the compact RHS is expanded to the trait-stacked long syntax before dispatch. The explicit long RHS remains accepted, so existing calls that already write 0 + trait and latent(0 + trait | group) keep working.

Tidyselect verbs are supported because traits() forwards its arguments to tidyr::pivot_longer(cols = ...): traits(all_of(cols)), traits(starts_with("sp")), traits(matches("^y[0-9]+$")), traits(any_of(c("a", "b"))), and bare names all work.

Cells with NA responses are dropped via pivot_longer(values_drop_na = TRUE) — the canonical default. Users who want strict listwise drop should pre-filter the wide data before calling.

Mixed-family fits (family = list(...) keyed by trait) flow through the long-format engine; traits() does not intercept the family argument. Per-row weight vectors of length nrow(data) are also replicated across traits automatically, then passed to the same long-format weight path used by gllvmTMB(). For per-cell weight matrices use the matrix-in entry point gllvmTMB_wide().

See also

gllvmTMB() for the long-format engine, gllvmTMB_wide() for the matrix-in API (use that when you have per-cell weight matrices or come from a gllvm-style workflow). The source-tree contract is docs/design/02-data-shape-and-weights.md.