This developer article documents the supported
drm_formula() grammar, including response blocks,
distributional-parameter formulae, and validation rules.
bf() remains a short alias.
The design reference is
docs/design/01-formula-grammar.md.
Fixed-effect formulas use base R’s ordinary formula machinery.
Transformations and interaction expansions such as
poly(x1, 2), I(x1^2), x1 * x2,
and (x1 + x2 + x3)^2 work for implemented fixed-effect
distributional-parameter formulas. For ecological and evolutionary
examples, second-order terms are usually the practical default;
third-order polynomials should be treated as exceptional and justified
by the question and data.
Current status map
Use this table before copying syntax into an analysis. “Implemented” means the syntax is parsed and has a fitted likelihood path for the family named in the notes. “Reserved” means the syntax is part of the public design vocabulary but should currently give a clear unsupported-feature error rather than a fit. “Planned” means the syntax is shown only to explain the roadmap.
In this table, “coscale” means a model for residual correlation,
currently rho12 in two-response Gaussian models.
| Syntax | Current status | Notes |
|---|---|---|
drm_formula() and bf()
|
Implemented |
drm_formula() is the explicit constructor;
bf() is a short alias. |
y ~ x1 + x2, sigma ~ x1
|
Implemented | Univariate Gaussian location-scale model. |
y ~ x1 + x2, sigma ~ x1,
nu ~ x2
|
Implemented | Fixed-effect univariate Student-t location-scale-shape model. |
y ~ x1 + x2, sigma ~ x1,
family = lognormal()
|
Implemented | Fixed-effect univariate lognormal model for positive responses;
mu and sigma are on the log-response
scale. |
y ~ x1 + x2, sigma ~ x1,
family = Gamma(link = "log")
|
Implemented | Fixed-effect univariate Gamma mean-CV model for positive responses;
mu is the response mean and sigma is the
coefficient of variation. |
y ~ x1 + x2, sigma ~ x1,
family = beta()
|
Implemented | Fixed-effect beta mean-scale model for strict continuous proportions
in (0, 1); public sigma maps internally to
phi = 1 / sigma^2. |
y ~ x1 + x2,
family = poisson(link = "log")
|
Implemented | Fixed-effect univariate Poisson mean model for non-negative integer counts. |
y ~ x1 + (1 | id) + (0 + x1 | id),
family = poisson(link = "log")
|
Implemented first slice | Ordinary Poisson mu random intercepts and independent
numeric slopes on the log-mean scale. Correlated Poisson slope blocks,
labelled covariance blocks, zero-inflated Poisson random effects, and
cross-parameter covariance remain planned. |
y ~ x1 + offset(log(exposure)),
family = poisson(link = "log")
|
Implemented | Exposure/rate Poisson model using standard R offset()
syntax in the mu formula. |
y ~ x1 + x2, zi ~ x1,
family = poisson(link = "log")
|
Implemented | Fixed-effect zero-inflated Poisson model; mu is the
conditional count mean and zi is the structural-zero
probability. |
y ~ x1 + x2, sigma ~ x1,
family = nbinom2()
|
Implemented | Fixed-effect univariate negative-binomial 2 model for overdispersed
counts; sigma is an overdispersion scale. |
y ~ x1 + (1 | id) + (0 + x1 | id),
sigma ~ x2, family = nbinom2()
|
Implemented first slice | Ordinary NB2 mu random intercepts and independent
numeric slopes on the log-mean scale. Correlated NB2 slope blocks,
labelled covariance blocks, zero-inflated NB2 random effects, and NB2
sigma random effects remain planned. |
y ~ x1 + offset(log(exposure)),
sigma ~ x2, family = nbinom2()
|
Implemented | Exposure/rate NB2 model; the offset enters the mu
linear predictor and sigma remains overdispersion. |
y ~ x1 + x2, sigma ~ x1,
zi ~ x2, family = nbinom2()
|
Implemented | Fixed-effect zero-inflated NB2 model; mu and
sigma describe the conditional count component and
zi is structural-zero probability. |
y ~ x1 + x2, sigma ~ x1,
family = truncated_nbinom2()
|
Implemented | Fixed-effect zero-truncated NB2 model for positive counts;
mu and sigma describe the untruncated count
component and fitted() is the positive-count mean. |
y ~ x1 + x2, sigma ~ x1,
hu ~ x2, family = truncated_nbinom2()
|
Implemented | Fixed-effect hurdle NB2 model; hu is the hurdle-zero
probability and nonzero counts come from the zero-truncated NB2
component. |
(1 | id) in mu
|
Implemented | Ordinary Gaussian location random intercept. |
(0 + x1 | id) in mu
|
Implemented | Independent numeric location random slope. |
(1 + x1 | id) in mu
|
Implemented | Correlated ordinary intercept-slope block. |
(1 + x1 | p | id) in mu
|
Implemented for naming |
p labels the group-level covariance block for output
and future matching. |
(1 | id) in sigma
|
Implemented | Residual-scale random intercept in the univariate Gaussian
sigma formula. |
(1 | p | id) in both mu and
sigma
|
Implemented | First univariate location-scale covariance slice; p
labels one mean-scale random-intercept covariance block. |
sd(id) ~ x_group |
Implemented | Models the SD of an unlabelled Gaussian mu random
intercept; predictors must be group-level. |
meta_known_V(V = V) |
Implemented | Known diagonal, block-diagonal, or dense sampling covariance in a
Gaussian model; bivariate Gaussian known V uses a
complete-row 2n by 2n row-paired matrix. |
mu1, mu2, sigma1,
sigma2, rho12
|
Implemented for fixed effects | Bivariate Gaussian location-coscale model with predictor-dependent residual correlation. |
(1 | p | id) in both bivariate mu1 and
mu2
|
Implemented | First bivariate group-level covariance slice; p labels
a mu1/mu2 random-intercept covariance
block. |
(1 | p | id) in both bivariate sigma1 and
sigma2
|
Implemented | First bivariate residual-scale covariance slice; p
labels a sigma1/sigma2 random-intercept
covariance block on the log-sigma scale. |
(1 | p | id) in same-response bivariate
mu1 and sigma1, or in mu2 and
sigma2
|
Implemented first slice | One labelled random-intercept pair creates a response-specific mean-scale covariance block. |
(1 | p | id) in all four bivariate mu1,
mu2, sigma1, and sigma2
formulas |
Implemented first slice | One ordinary q=4 location-scale covariance block reports all six latent random-effect correlations. |
sd1(id) ~ x_group, sd2(id) ~ x_group
|
Implemented | Family B direct SD formulas for labelled bivariate location random intercepts; not for residual-scale random effects. |
family = c(gaussian(), gaussian()) |
Implemented | Routes to the bivariate Gaussian engine; mixed composed families are planned. |
biv_gaussian() |
Implemented legacy helper | Kept for compatibility while composed families become the public direction. |
mvbind(y1, y2) ~ x1 |
Implemented | Shorthand for identical bivariate location formulas; explicit
mu1/mu2 remains preferred for different
predictors. |
phylo(1 | species, tree = tree) in mu
|
Implemented | Intercept-only univariate Gaussian phylogenetic location effect; requires an ultrametric tree with branch lengths. |
matching phylo(1 | species, tree = tree) in bivariate
mu1 and mu2
|
Implemented first slice | Estimates two phylogenetic location SDs and one phylogenetic mean-mean correlation. |
labelled phylo(1 | p | species, tree = tree) in
matching bivariate mu1 and mu2
|
Implemented | The label is preserved in SD, correlation, corpairs(),
and profile-target names for the phylogenetic mean-mean path. |
labelled phylo(1 | p | species, tree = tree) in all
four bivariate mu1, mu2, sigma1,
and sigma2 formulas |
Implemented first slice | One constant q=4 phylogenetic location-scale block estimates four endpoint SDs and six latent phylogenetic correlations. Partial, unlabelled, mismatched, and slope forms remain rejected. |
y ~ x1 + x2,
family = cumulative_logit()
|
Implemented | Fixed-effect univariate ordinal model for ordered scores with cutpoints; ordinal random effects and scale/discrimination formulas are planned. |
cbind(successes, failures) ~ x1,
family = beta_binomial()
|
Implemented | Fixed-effect denominator-aware model for success counts with known
trial totals; sigma is extra-binomial variation. |
phylo(1 + x1 | species, tree = tree) |
Planned | Structured slopes come after the intercept-only path is fully
hardened. The first path should fit one structured mu
slope; two slopes are the near-term upper bound. |
animal(1 + x | id, pedigree = ped) |
Planned marker grammar | Parsed as a future animal-model one-slope route, but rejected before fitting until animal intercepts, diagnostics, profile targets, and recovery tests exist. |
spatial(1 | site, coords = coords) |
Implemented first slice | Univariate Gaussian mu spatial random intercept using a
fixed coordinate covariance foundation; mesh/SPDE paths remain
planned. |
spatial(1 + x | site, coords = coords) |
Implemented first one-slope slice | Univariate Gaussian mu spatial random intercept plus
one numeric spatial slope. The two coordinate fields are independent and
have separate SDs. |
relmat(1 + x | id, K = K) |
Planned marker grammar | Parsed as a future lower-level relatedness one-slope route, but rejected before fitting until matrix validation, diagnostics, profile targets, and recovery tests exist. |
corpair(id, level = "group", block = "p", from = "mu1", to = "mu2") ~ x_group |
Implemented | Predictor-dependent ordinary q=2 location-location latent
random-effect correlation regression for matching labelled
mu1/mu2 random intercepts. Predictors must be
constant within id. |
corpair(species, level = "phylogenetic", block = "p", from = "mu1", to = "mu2") ~ ecology |
Implemented | Predictor-dependent phylogenetic q=2 location-location latent
random-effect correlation regression for matching labelled
mu1/mu2 phylo() terms. Predictors
must be constant within species. Location-scale,
scale-scale, q=4, and spatial corpair() regressions remain
planned. |
Bivariate random slopes, non-Gaussian cross-parameter covariance,
spatial q=4 blocks, predictor-dependent q=4 phylogenetic/spatial
correlations, or rho12 random effects |
Planned | These need a larger covariance parameterization, simulation, and
naming checks before fitting. Intercept-slope corpair()
rows are distant-future; a later slope1-slope2 bivariate
plasticity-syndrome target needs coefficient-aware syntax. |
corpairs(fit) |
Implemented for current fitted correlations | Reports residual rho12, ordinary and bivariate
group-level covariance rows including ordinary q=4 blocks, and bivariate
phylogenetic mean-mean or q=4 rows; spatial pair classes remain
planned. |
Current implemented univariate Gaussian random-effect forms include:
drm_formula(y ~ x + (1 | id), sigma ~ z)
drm_formula(y ~ x + (0 + x | id), sigma ~ z)
drm_formula(y ~ x + (1 + x | id), sigma ~ z)
drm_formula(y ~ x + (1 + x | p | id), sigma ~ z)
drm_formula(y ~ x + (1 | id), sigma ~ z + (1 | id) + (0 + w | id))
drm_formula(y ~ x + (1 | p | id), sigma ~ z + (1 | p | id))
drm_formula(y ~ x + (1 | id), sigma ~ z, sd(id) ~ x_group)
drm_formula(y ~ x + phylo(1 | species, tree = tree), sigma ~ z)
drm_formula(
y ~ x + (1 | id) + (1 | site),
sigma ~ z,
sd(id) ~ x_group,
sd(site) ~ site_type
)Current implemented Student-t form:
drm_formula(y ~ x, sigma ~ z, nu ~ 1)
drm_formula(y ~ x, sigma ~ z, nu ~ x)Student-t random effects, known sampling covariance, phylogenetic terms, and bivariate Student-t models are later phases.
Future skew-normal and skew-t formulas should also start with
fixed-effect residual-shape terms such as nu ~ x. A later
latent-effect spelling such as skew(id) ~ x_group would ask
a different question about the distribution of group effects and should
not be taught as an alias for residual nu.
Current implemented lognormal form:
drm_formula(biomass ~ habitat, sigma ~ treatment)paired with:
family = lognormal()The response must be positive and finite. mu and
sigma are defined on the log-response scale, and
fitted() returns the arithmetic response mean
exp(mu + sigma^2 / 2).
Current implemented Gamma form:
drm_formula(biomass ~ habitat, sigma ~ treatment)paired with:
family = Gamma(link = "log")Here mu is the response mean and sigma is
the coefficient of variation. Gamma random effects, known sampling
covariance, phylogenetic terms, and bivariate or mixed Gamma models are
later phases.
The middle p is a group-level covariance-block label. It
is not residual rho12, and reserved distributional
parameter names such as rho12 should not be used as labels.
Matching (1 | p | id) terms in univariate Gaussian
mu and sigma fit the first mean-scale
covariance block; larger cross-formula blocks with slopes remain future
work.
Current implemented bivariate form:
drm_formula(
mu1 = y1 ~ x1 + x2,
mu2 = y2 ~ x1,
sigma1 = ~ x1 + x2,
sigma2 = ~ x1,
rho12 = ~ x1 + x2
)When both responses share the same location predictors, the shorter form is also implemented:
drm_formula(
mvbind(y1, y2) ~ x1 + x2,
sigma1 = ~ x1 + x2,
sigma2 = ~ x1,
rho12 = ~ x1 + x2
)This expands internally to mu1 = y1 ~ x1 + x2 and
mu2 = y2 ~ x1 + x2. Use the explicit mu1 and
mu2 formulas when the two responses need different location
predictors.
The first fitted structured-effect marker is an intercept-only phylogenetic location effect:
drm_formula(y ~ x + phylo(1 | species, tree = tree), sigma ~ z)phylo() uses an ultrametric branch-length tree and adds
a tree-structured Gaussian random intercept to the mu
predictor. Matching intercept-only phylo() terms in
bivariate mu1 and mu2 formulas fit the first
phylogenetic mean-mean correlation slice:
drm_formula(
mu1 = y1 ~ x + phylo(1 | species, tree = tree),
mu2 = y2 ~ x + phylo(1 | species, tree = tree),
sigma1 = ~ 1,
sigma2 = ~ 1,
rho12 = ~ 1
)Phylogenetic random slopes, phylogenetic scale terms, structured
effects in rho12, animal-model slopes, lower-level
relmat() slopes, multiple spatial slopes, and mesh-based
spatial terms are still planned:
# planned; drmTMB() will currently reject these
drm_formula(y ~ x + phylo(1 + x | species, tree = tree), sigma ~ z)
drm_formula(y ~ x + animal(1 + x | id, pedigree = ped), sigma ~ z)
drm_formula(y ~ x + relmat(1 + x | id, K = K), sigma ~ z)
drm_formula(y ~ x + spatial(1 | site, mesh = mesh), sigma ~ z)The first fitted spatial() paths use coordinates
directly for univariate Gaussian mu coefficient fields:
drmTMB(
bf(y ~ x + spatial(1 | site, coords = coords), sigma ~ z),
family = gaussian(),
data = dat
)
drmTMB(
bf(y ~ x + spatial(1 + x | site, coords = coords), sigma ~ z),
family = gaussian(),
data = dat
)coords can have one row per site or one row per
observation, but coordinates must be constant within each site. A
precomputed mesh remains planned. Mesh is not the
scientific sampling level. It is the computational scaffold that will
keep a future SPDE/GMRF spatial field scalable. These structured effects
are separate from residual rho12.
For structured slopes, the fitted coordinate-spatial path now
supports one numeric mu slope as independent intercept and
slope fields with separate SDs. Additional structured slopes remain the
advanced path after more recovery evidence. Multiple random factors
should remain separate additive blocks. The first slope models should
not estimate intercept-slope corpair() rows; the more
interesting future bivariate target is a slope1-slope2 correlation for
the same covariate across responses.
The first bivariate group-level covariance syntax is explicit about
the distinction between group-level covariance labels and residual
rho12:
drm_formula(
mu1 = y1 ~ x1 + x2 + (1 | p | ID),
mu2 = y2 ~ x1 + (1 | p | ID),
sigma1 = ~ x1 + x2 + (1 | q | ID),
sigma2 = ~ x1 + (1 | q | ID),
rho12 = ~ x1 + x2
)The shared p label fits a group-level random-intercept
covariance block for the two response means. The shared q
label fits a separate scale-scale block for the two residual-scale
random intercepts on the log-sigma scale. The first
same-response mean-scale bridge is also implemented, using a shared
label in mu1 and sigma1, or in
mu2 and sigma2. Reusing the same label and
group in all four bivariate formulas fits the first ordinary q=4
location-scale block:
drm_formula(
mu1 = y1 ~ x + (1 | p | ID),
mu2 = y2 ~ x + (1 | p | ID),
sigma1 = ~ z + (1 | p | ID),
sigma2 = ~ z + (1 | p | ID),
rho12 = ~ w
)That q=4 block reports one location-location row, four location-scale
rows, and one scale-scale row in corpairs(). Fitted rows
still use the older mean-mean and mean-scale
class names, but corpairs() accepts
location-location and location-scale as filter
aliases to match the reserved corpair() formula wording.
The first fitted corpair() route is the ordinary q=2
location-location model:
corpair(ID, level = "group", block = "p", from = "mu1", to = "mu2") ~ w.
The predictor w must be constant within ID,
and corpairs() reports the mean, range, and number of
fitted group-level correlations. Later location-scale routes should also
be endpoint-specific, for example
corpair(ID, level = "group", block = "p", from = "mu1", to = "sigma2") ~ w,
because the four location-scale pairs in q=4 models do not have
identical interpretation. For phylogenetic life-history trade-off
models, the same grammar now also covers the fitted q=2
location-location route:
corpair(species, level = "phylogenetic", block = "p", from = "mu1", to = "mu2") ~ ecology.
The phylogenetic formula has an extra design gate: because the tree
couples all species, drmTMB needs a positive-definite
covariance contract for the full species block before a species-level
predictor can change the latent phylogenetic correlation. The selected
q=2 contract uses two independent unit phylogenetic fields and
species-specific loadings, so variable rho_l is a
nonstationary phylogenetic covariance model rather than a residual-style
correlation row. It covers mu1-mu2 only;
phylogenetic location-scale and scale-scale correlation regressions
remain q=4 extensions. Bivariate random slopes, spatial q=4 blocks,
predictor-dependent q=4 phylogenetic correlations, and
rho12 random effects remain planned. Double-hierarchical
bivariate fits should continue to report correlation pairs in a long
table, not just as unlabeled matrices. The design note
docs/design/20-coscale-correlation-pairs.md records the
planned labels: correlation level, group, covariance block,
distributional parameters, responses, and random-effect coefficients.
The current corpairs(fit) helper already uses this table
shape for residual rho12, ordinary univariate Gaussian
mu random-effect correlations, and the implemented
bivariate mean-mean, scale-scale, same-response mean-scale, ordinary
q=4, and phylogenetic mean-mean random-intercept correlations.