Skip to contents

This developer article documents the supported drm_formula() grammar, including response blocks, distributional-parameter formulae, and validation rules. bf() remains a short alias.

The design reference is docs/design/01-formula-grammar.md.

Fixed-effect formulas use base R’s ordinary formula machinery. Transformations and interaction expansions such as poly(x1, 2), I(x1^2), x1 * x2, and (x1 + x2 + x3)^2 work for implemented fixed-effect distributional-parameter formulas. For ecological and evolutionary examples, second-order terms are usually the practical default; third-order polynomials should be treated as exceptional and justified by the question and data.

Current status map

Use this table before copying syntax into an analysis. “Implemented” means the syntax is parsed and has a fitted likelihood path for the family named in the notes. “Reserved” means the syntax is part of the public design vocabulary but should currently give a clear unsupported-feature error rather than a fit. “Planned” means the syntax is shown only to explain the roadmap.

In this table, “coscale” means a model for residual correlation, currently rho12 in two-response Gaussian models.

Syntax Current status Notes
drm_formula() and bf() Implemented drm_formula() is the explicit constructor; bf() is a short alias.
y ~ x1 + x2, sigma ~ x1 Implemented Univariate Gaussian location-scale model.
y ~ x1 + x2, sigma ~ x1, nu ~ x2 Implemented Fixed-effect univariate Student-t location-scale-shape model.
y ~ x1 + x2, sigma ~ x1, family = lognormal() Implemented Fixed-effect univariate lognormal model for positive responses; mu and sigma are on the log-response scale.
y ~ x1 + x2, sigma ~ x1, family = Gamma(link = "log") Implemented Fixed-effect univariate Gamma mean-CV model for positive responses; mu is the response mean and sigma is the coefficient of variation.
y ~ x1 + x2, sigma ~ x1, family = beta() Implemented Fixed-effect beta mean-scale model for strict continuous proportions in (0, 1); public sigma maps internally to phi = 1 / sigma^2.
y ~ x1 + x2, family = poisson(link = "log") Implemented Fixed-effect univariate Poisson mean model for non-negative integer counts.
y ~ x1 + (1 | id) + (0 + x1 | id), family = poisson(link = "log") Implemented first slice Ordinary Poisson mu random intercepts and independent numeric slopes on the log-mean scale. Correlated Poisson slope blocks, labelled covariance blocks, zero-inflated Poisson random effects, and cross-parameter covariance remain planned.
y ~ x1 + offset(log(exposure)), family = poisson(link = "log") Implemented Exposure/rate Poisson model using standard R offset() syntax in the mu formula.
y ~ x1 + x2, zi ~ x1, family = poisson(link = "log") Implemented Fixed-effect zero-inflated Poisson model; mu is the conditional count mean and zi is the structural-zero probability.
y ~ x1 + x2, sigma ~ x1, family = nbinom2() Implemented Fixed-effect univariate negative-binomial 2 model for overdispersed counts; sigma is an overdispersion scale.
y ~ x1 + (1 | id) + (0 + x1 | id), sigma ~ x2, family = nbinom2() Implemented first slice Ordinary NB2 mu random intercepts and independent numeric slopes on the log-mean scale. Correlated NB2 slope blocks, labelled covariance blocks, zero-inflated NB2 random effects, and NB2 sigma random effects remain planned.
y ~ x1 + offset(log(exposure)), sigma ~ x2, family = nbinom2() Implemented Exposure/rate NB2 model; the offset enters the mu linear predictor and sigma remains overdispersion.
y ~ x1 + x2, sigma ~ x1, zi ~ x2, family = nbinom2() Implemented Fixed-effect zero-inflated NB2 model; mu and sigma describe the conditional count component and zi is structural-zero probability.
y ~ x1 + x2, sigma ~ x1, family = truncated_nbinom2() Implemented Fixed-effect zero-truncated NB2 model for positive counts; mu and sigma describe the untruncated count component and fitted() is the positive-count mean.
y ~ x1 + x2, sigma ~ x1, hu ~ x2, family = truncated_nbinom2() Implemented Fixed-effect hurdle NB2 model; hu is the hurdle-zero probability and nonzero counts come from the zero-truncated NB2 component.
(1 | id) in mu Implemented Ordinary Gaussian location random intercept.
(0 + x1 | id) in mu Implemented Independent numeric location random slope.
(1 + x1 | id) in mu Implemented Correlated ordinary intercept-slope block.
(1 + x1 | p | id) in mu Implemented for naming p labels the group-level covariance block for output and future matching.
(1 | id) in sigma Implemented Residual-scale random intercept in the univariate Gaussian sigma formula.
(1 | p | id) in both mu and sigma Implemented First univariate location-scale covariance slice; p labels one mean-scale random-intercept covariance block.
sd(id) ~ x_group Implemented Models the SD of an unlabelled Gaussian mu random intercept; predictors must be group-level.
meta_known_V(V = V) Implemented Known diagonal, block-diagonal, or dense sampling covariance in a Gaussian model; bivariate Gaussian known V uses a complete-row 2n by 2n row-paired matrix.
mu1, mu2, sigma1, sigma2, rho12 Implemented for fixed effects Bivariate Gaussian location-coscale model with predictor-dependent residual correlation.
(1 | p | id) in both bivariate mu1 and mu2 Implemented First bivariate group-level covariance slice; p labels a mu1/mu2 random-intercept covariance block.
(1 | p | id) in both bivariate sigma1 and sigma2 Implemented First bivariate residual-scale covariance slice; p labels a sigma1/sigma2 random-intercept covariance block on the log-sigma scale.
(1 | p | id) in same-response bivariate mu1 and sigma1, or in mu2 and sigma2 Implemented first slice One labelled random-intercept pair creates a response-specific mean-scale covariance block.
(1 | p | id) in all four bivariate mu1, mu2, sigma1, and sigma2 formulas Implemented first slice One ordinary q=4 location-scale covariance block reports all six latent random-effect correlations.
sd1(id) ~ x_group, sd2(id) ~ x_group Implemented Family B direct SD formulas for labelled bivariate location random intercepts; not for residual-scale random effects.
family = c(gaussian(), gaussian()) Implemented Routes to the bivariate Gaussian engine; mixed composed families are planned.
biv_gaussian() Implemented legacy helper Kept for compatibility while composed families become the public direction.
mvbind(y1, y2) ~ x1 Implemented Shorthand for identical bivariate location formulas; explicit mu1/mu2 remains preferred for different predictors.
phylo(1 | species, tree = tree) in mu Implemented Intercept-only univariate Gaussian phylogenetic location effect; requires an ultrametric tree with branch lengths.
matching phylo(1 | species, tree = tree) in bivariate mu1 and mu2 Implemented first slice Estimates two phylogenetic location SDs and one phylogenetic mean-mean correlation.
labelled phylo(1 | p | species, tree = tree) in matching bivariate mu1 and mu2 Implemented The label is preserved in SD, correlation, corpairs(), and profile-target names for the phylogenetic mean-mean path.
labelled phylo(1 | p | species, tree = tree) in all four bivariate mu1, mu2, sigma1, and sigma2 formulas Implemented first slice One constant q=4 phylogenetic location-scale block estimates four endpoint SDs and six latent phylogenetic correlations. Partial, unlabelled, mismatched, and slope forms remain rejected.
y ~ x1 + x2, family = cumulative_logit() Implemented Fixed-effect univariate ordinal model for ordered scores with cutpoints; ordinal random effects and scale/discrimination formulas are planned.
cbind(successes, failures) ~ x1, family = beta_binomial() Implemented Fixed-effect denominator-aware model for success counts with known trial totals; sigma is extra-binomial variation.
phylo(1 + x1 | species, tree = tree) Planned Structured slopes come after the intercept-only path is fully hardened. The first path should fit one structured mu slope; two slopes are the near-term upper bound.
animal(1 + x | id, pedigree = ped) Planned marker grammar Parsed as a future animal-model one-slope route, but rejected before fitting until animal intercepts, diagnostics, profile targets, and recovery tests exist.
spatial(1 | site, coords = coords) Implemented first slice Univariate Gaussian mu spatial random intercept using a fixed coordinate covariance foundation; mesh/SPDE paths remain planned.
spatial(1 + x | site, coords = coords) Implemented first one-slope slice Univariate Gaussian mu spatial random intercept plus one numeric spatial slope. The two coordinate fields are independent and have separate SDs.
relmat(1 + x | id, K = K) Planned marker grammar Parsed as a future lower-level relatedness one-slope route, but rejected before fitting until matrix validation, diagnostics, profile targets, and recovery tests exist.
corpair(id, level = "group", block = "p", from = "mu1", to = "mu2") ~ x_group Implemented Predictor-dependent ordinary q=2 location-location latent random-effect correlation regression for matching labelled mu1/mu2 random intercepts. Predictors must be constant within id.
corpair(species, level = "phylogenetic", block = "p", from = "mu1", to = "mu2") ~ ecology Implemented Predictor-dependent phylogenetic q=2 location-location latent random-effect correlation regression for matching labelled mu1/mu2 phylo() terms. Predictors must be constant within species. Location-scale, scale-scale, q=4, and spatial corpair() regressions remain planned.
Bivariate random slopes, non-Gaussian cross-parameter covariance, spatial q=4 blocks, predictor-dependent q=4 phylogenetic/spatial correlations, or rho12 random effects Planned These need a larger covariance parameterization, simulation, and naming checks before fitting. Intercept-slope corpair() rows are distant-future; a later slope1-slope2 bivariate plasticity-syndrome target needs coefficient-aware syntax.
corpairs(fit) Implemented for current fitted correlations Reports residual rho12, ordinary and bivariate group-level covariance rows including ordinary q=4 blocks, and bivariate phylogenetic mean-mean or q=4 rows; spatial pair classes remain planned.

Current implemented univariate Gaussian random-effect forms include:

drm_formula(y ~ x + (1 | id), sigma ~ z)
drm_formula(y ~ x + (0 + x | id), sigma ~ z)
drm_formula(y ~ x + (1 + x | id), sigma ~ z)
drm_formula(y ~ x + (1 + x | p | id), sigma ~ z)
drm_formula(y ~ x + (1 | id), sigma ~ z + (1 | id) + (0 + w | id))
drm_formula(y ~ x + (1 | p | id), sigma ~ z + (1 | p | id))
drm_formula(y ~ x + (1 | id), sigma ~ z, sd(id) ~ x_group)
drm_formula(y ~ x + phylo(1 | species, tree = tree), sigma ~ z)
drm_formula(
  y ~ x + (1 | id) + (1 | site),
  sigma ~ z,
  sd(id) ~ x_group,
  sd(site) ~ site_type
)

Current implemented Student-t form:

drm_formula(y ~ x, sigma ~ z, nu ~ 1)
drm_formula(y ~ x, sigma ~ z, nu ~ x)

Student-t random effects, known sampling covariance, phylogenetic terms, and bivariate Student-t models are later phases.

Future skew-normal and skew-t formulas should also start with fixed-effect residual-shape terms such as nu ~ x. A later latent-effect spelling such as skew(id) ~ x_group would ask a different question about the distribution of group effects and should not be taught as an alias for residual nu.

Current implemented lognormal form:

drm_formula(biomass ~ habitat, sigma ~ treatment)

paired with:

family = lognormal()

The response must be positive and finite. mu and sigma are defined on the log-response scale, and fitted() returns the arithmetic response mean exp(mu + sigma^2 / 2).

Current implemented Gamma form:

drm_formula(biomass ~ habitat, sigma ~ treatment)

paired with:

family = Gamma(link = "log")

Here mu is the response mean and sigma is the coefficient of variation. Gamma random effects, known sampling covariance, phylogenetic terms, and bivariate or mixed Gamma models are later phases.

The middle p is a group-level covariance-block label. It is not residual rho12, and reserved distributional parameter names such as rho12 should not be used as labels. Matching (1 | p | id) terms in univariate Gaussian mu and sigma fit the first mean-scale covariance block; larger cross-formula blocks with slopes remain future work.

Current implemented bivariate form:

drm_formula(
  mu1 = y1 ~ x1 + x2,
  mu2 = y2 ~ x1,
  sigma1 = ~ x1 + x2,
  sigma2 = ~ x1,
  rho12 = ~ x1 + x2
)

When both responses share the same location predictors, the shorter form is also implemented:

drm_formula(
  mvbind(y1, y2) ~ x1 + x2,
  sigma1 = ~ x1 + x2,
  sigma2 = ~ x1,
  rho12 = ~ x1 + x2
)

This expands internally to mu1 = y1 ~ x1 + x2 and mu2 = y2 ~ x1 + x2. Use the explicit mu1 and mu2 formulas when the two responses need different location predictors.

The first fitted structured-effect marker is an intercept-only phylogenetic location effect:

drm_formula(y ~ x + phylo(1 | species, tree = tree), sigma ~ z)

phylo() uses an ultrametric branch-length tree and adds a tree-structured Gaussian random intercept to the mu predictor. Matching intercept-only phylo() terms in bivariate mu1 and mu2 formulas fit the first phylogenetic mean-mean correlation slice:

drm_formula(
  mu1 = y1 ~ x + phylo(1 | species, tree = tree),
  mu2 = y2 ~ x + phylo(1 | species, tree = tree),
  sigma1 = ~ 1,
  sigma2 = ~ 1,
  rho12 = ~ 1
)

Phylogenetic random slopes, phylogenetic scale terms, structured effects in rho12, animal-model slopes, lower-level relmat() slopes, multiple spatial slopes, and mesh-based spatial terms are still planned:

# planned; drmTMB() will currently reject these
drm_formula(y ~ x + phylo(1 + x | species, tree = tree), sigma ~ z)
drm_formula(y ~ x + animal(1 + x | id, pedigree = ped), sigma ~ z)
drm_formula(y ~ x + relmat(1 + x | id, K = K), sigma ~ z)
drm_formula(y ~ x + spatial(1 | site, mesh = mesh), sigma ~ z)

The first fitted spatial() paths use coordinates directly for univariate Gaussian mu coefficient fields:

drmTMB(
  bf(y ~ x + spatial(1 | site, coords = coords), sigma ~ z),
  family = gaussian(),
  data = dat
)

drmTMB(
  bf(y ~ x + spatial(1 + x | site, coords = coords), sigma ~ z),
  family = gaussian(),
  data = dat
)

coords can have one row per site or one row per observation, but coordinates must be constant within each site. A precomputed mesh remains planned. Mesh is not the scientific sampling level. It is the computational scaffold that will keep a future SPDE/GMRF spatial field scalable. These structured effects are separate from residual rho12.

For structured slopes, the fitted coordinate-spatial path now supports one numeric mu slope as independent intercept and slope fields with separate SDs. Additional structured slopes remain the advanced path after more recovery evidence. Multiple random factors should remain separate additive blocks. The first slope models should not estimate intercept-slope corpair() rows; the more interesting future bivariate target is a slope1-slope2 correlation for the same covariate across responses.

The first bivariate group-level covariance syntax is explicit about the distinction between group-level covariance labels and residual rho12:

drm_formula(
  mu1 = y1 ~ x1 + x2 + (1 | p | ID),
  mu2 = y2 ~ x1      + (1 | p | ID),
  sigma1 = ~ x1 + x2 + (1 | q | ID),
  sigma2 = ~ x1      + (1 | q | ID),
  rho12 = ~ x1 + x2
)

The shared p label fits a group-level random-intercept covariance block for the two response means. The shared q label fits a separate scale-scale block for the two residual-scale random intercepts on the log-sigma scale. The first same-response mean-scale bridge is also implemented, using a shared label in mu1 and sigma1, or in mu2 and sigma2. Reusing the same label and group in all four bivariate formulas fits the first ordinary q=4 location-scale block:

drm_formula(
  mu1 = y1 ~ x + (1 | p | ID),
  mu2 = y2 ~ x + (1 | p | ID),
  sigma1 = ~ z + (1 | p | ID),
  sigma2 = ~ z + (1 | p | ID),
  rho12 = ~ w
)

That q=4 block reports one location-location row, four location-scale rows, and one scale-scale row in corpairs(). Fitted rows still use the older mean-mean and mean-scale class names, but corpairs() accepts location-location and location-scale as filter aliases to match the reserved corpair() formula wording. The first fitted corpair() route is the ordinary q=2 location-location model: corpair(ID, level = "group", block = "p", from = "mu1", to = "mu2") ~ w. The predictor w must be constant within ID, and corpairs() reports the mean, range, and number of fitted group-level correlations. Later location-scale routes should also be endpoint-specific, for example corpair(ID, level = "group", block = "p", from = "mu1", to = "sigma2") ~ w, because the four location-scale pairs in q=4 models do not have identical interpretation. For phylogenetic life-history trade-off models, the same grammar now also covers the fitted q=2 location-location route: corpair(species, level = "phylogenetic", block = "p", from = "mu1", to = "mu2") ~ ecology. The phylogenetic formula has an extra design gate: because the tree couples all species, drmTMB needs a positive-definite covariance contract for the full species block before a species-level predictor can change the latent phylogenetic correlation. The selected q=2 contract uses two independent unit phylogenetic fields and species-specific loadings, so variable rho_l is a nonstationary phylogenetic covariance model rather than a residual-style correlation row. It covers mu1-mu2 only; phylogenetic location-scale and scale-scale correlation regressions remain q=4 extensions. Bivariate random slopes, spatial q=4 blocks, predictor-dependent q=4 phylogenetic correlations, and rho12 random effects remain planned. Double-hierarchical bivariate fits should continue to report correlation pairs in a long table, not just as unlabeled matrices. The design note docs/design/20-coscale-correlation-pairs.md records the planned labels: correlation level, group, covariance block, distributional parameters, responses, and random-effect coefficients. The current corpairs(fit) helper already uses this table shape for residual rho12, ordinary univariate Gaussian mu random-effect correlations, and the implemented bivariate mean-mean, scale-scale, same-response mean-scale, ordinary q=4, and phylogenetic mean-mean random-intercept correlations.