Distributional regression with drmTMB • drmTMB

drmTMB is being developed as a model-first package: the documentation should pair symbolic equations with the R syntax that fits them. The current implementation includes Gaussian location-scale models with fixed effects, ordinary location and scale random effects, group-level random-effect scale formulae such as sd(id) ~ x_group, fixed-effect non-Gaussian families, known-covariance Gaussian meta-analysis, bivariate Gaussian residual coupling with rho12 ~ predictors, first ordinary bivariate covariance blocks, phylogenetic Gaussian covariance slices, and coordinate-based spatial mu intercept and one-slope fields. The articles “What can I fit today?” and “Choosing response families” give the broader implemented map.

The current package is already useful for one central workflow: write one formula for each parameter you want to estimate, then check that the fitted model matches the symbolic model you had in mind.

Install the preview

drmTMB is not on CRAN yet. Install the tagged preview from GitHub with pak:

install.packages("pak")
pak::pak("itchyshin/drmTMB@v0.1.2")

For the newest development build from main, use:

pak::pak("itchyshin/drmTMB")

You need R 4.1.0 or newer and a working compiler toolchain because TMB models are compiled during installation. If installation fails while compiling C++, install the usual R build tools for your platform: Rtools on Windows, Xcode Command Line Tools on macOS, or the R development toolchain on Linux.

The core runtime dependencies are installed automatically by pak: cli, Matrix, TMB, and the compiled headers from RcppEigen and TMB. The articles and development checks also use optional packages such as glmmTMB, lme4, MASS, metafor, knitr, rmarkdown, testthat, withr, and pkgdown.

For example, an applied user might ask: do mean trait values and residual variability change with an environmental predictor, after accounting for repeated measures from the same site or species? In drmTMB, that question is written as one formula for the mean and one formula for the residual scale.

Fit your first model

Start with a Gaussian location-scale model when the response is continuous and the scientific question is about both the expected value and predictability. In the small example below, habitat and temperature affect mean growth, while habitat also changes residual variation:

set.seed(13)
n <- 120
dat <- data.frame(
  habitat = factor(rep(c("forest", "grassland"), each = n / 2)),
  temperature = rnorm(n)
)
mu <- 1 + 0.6 * (dat$habitat == "grassland") + 0.4 * dat$temperature
sigma <- exp(-0.5 + 0.45 * (dat$habitat == "grassland"))
dat$growth <- rnorm(n, mean = mu, sd = sigma)

The fitted model uses one formula for mu and one formula for sigma:

fit <- drmTMB(
  drm_formula(growth ~ habitat + temperature, sigma ~ habitat),
  family = gaussian(),
  data = dat
)

check_drm(fit)
#> <drm_check: 10 checks>
#> ok: 10; notes: 0; warnings: 0; errors: 0
#>                      check status
#>      optimizer_convergence     ok
#>           optimizer_budget     ok
#>           finite_objective     ok
#>             fixed_gradient     ok
#>            sdreport_status     ok
#>  hessian_positive_definite     ok
#>     standard_errors_finite     ok
#>               dropped_rows     ok
#>             positive_scale     ok
#>   fixed_effect_design_size     ok
#>                                                                                   value
#>                                                                                       0
#>                                                 iterations=17; function=24; gradient=18
#>                                                                                   144.3
#>                                                  max=0.0005175; component=beta_sigma[1]
#>                                                                                      ok
#>                                                                                    TRUE
#>                                                                  range=[0.07859,0.1484]
#>                                                                     nobs=120; dropped=0
#>                                                                              min=0.7396
#>  total_mb=0.02184; max_cols=3; largest=mu; largest_class=matrix; largest_density=0.8333
#>                                                                              message
#>                                                        nlminb convergence code is 0.
#>  Optimizer evaluation counts recorded; no eval.max or iter.max control was supplied.
#>                                             Objective and log-likelihood are finite.
#>     Maximum absolute fixed gradient is <= 0.001; largest component is beta_sigma[1].
#>                                              TMB::sdreport() completed successfully.
#>                                        sdreport reports a positive-definite Hessian.
#>                                         All fixed-effect standard errors are finite.
#>                   No rows were dropped by model-frame or known-covariance filtering.
#>                                     All fitted scale values are finite and positive.
#>                          Dense fixed-effect design matrices are modest for this fit.

Read the sigma coefficient as a log residual-SD contrast. Exponentiating it gives an SD ratio; exponentiating twice the coefficient gives a residual variance ratio:

sigma_habitat <- coef(fit, "sigma")["habitatgrassland"]
data.frame(
  residual_sd_ratio = exp(sigma_habitat),
  residual_variance_ratio = exp(2 * sigma_habitat)
)
#>                  residual_sd_ratio residual_variance_ratio
#> habitatgrassland          1.186114                1.406866

For a fuller walkthrough of fitted means, residual SDs, and residual variances, read “When variance carries signal”.

Learning path

Start with the question you want to answer, not with the full list of features. The tutorials are arranged so that each article pairs symbolic equations, R syntax, fitted output, and interpretation.

If your question is…	Read this first	Main parameter
Do predictors change the mean and residual variability of one response?	When variance carries signal	`mu`, `sigma`
Am I modelling residual variation, group-level variation, or likelihood weights?	Which scale are you modelling?	`sigma`, `sd(group)`, `weights`
Which continuous, count, proportion, or robust family should I use?	Choosing response families	family-specific `mu`, `sigma`, `nu`, `zi`, `hu`
Do counts show extra-Poisson variation or structural zeros?	Count abundance and extra zeros	NB2 `mu`, `sigma`, `zi`
Are proportions successes out of trials or continuous rates inside `(0, 1)`?	Proportions and success rates	beta and beta-binomial `mu`, `sigma`
Does a predictor change residual coupling between two responses?	Changing residual coupling with rho12	`rho12`
Do repeated individuals have correlated average responses after accounting for residual coupling?	Changing residual coupling with rho12	`corpairs(level = "group")`
Do effect sizes have known sampling variances or covariance?	Mean effects and residual heterogeneity in meta-analysis	`meta_known_V(V = V)`, `sigma`
Do group or phylogenetic covariance rows answer a bivariate question beyond residual coupling?	What can I fit today?	`corpairs()`, `corpair()`, `sd_phylo*()`
Do species trait means remain similar after accounting for shared ancestry, or do sites share coordinate-structured deviations?	Structural dependence	`phylo()`, `spatial()`
After fitting, which checks and interval-status columns should I read?	Checking and using fitted models	`check_drm()`, `profile_targets()`, `conf.status`

Here nu is a shape parameter, zi and hu are zero-process parameters, and sd(group) refers to a group-level random-effect standard deviation rather than the residual scale sigma. Use the model guide “What can I fit today?” when you need the longer status map before choosing syntax.

For a first applied analysis, fit the simplest model that answers the question, run check_drm(), and then read the coefficient table on the parameter scale used by the model. For example, sigma coefficients are on a log scale in Gaussian location-scale models, while rho12(fit) returns residual correlations on the response scale.

For slope and variance-component questions, name the estimand before reporting the number. A mu slope is an expected-response effect, a sigma slope is a log residual-SD effect, a random-slope SD is among-group variation in a reaction norm, and sd(group) ~ x_group is a model for the SD of a group-level mean effect. Those four quantities can all involve a predictor, but they answer different biological questions.

What can I fit today?

The detailed implementation map now lives in the model guide “What can I fit today?”. This getting-started article keeps the first fitted example close to the top, then sends you to the right guide or worked tutorial.

At a high level, the current public surface is:

Model family or component	Current status	Main article
Gaussian location-scale fixed effects	stable	“When variance carries signal”
Gaussian `mu` random intercepts and one-slope blocks	stable	“When variance carries signal”
residual-scale random intercepts and independent slopes	stable or first slice, depending on term	“Which scale are you modelling?”
unlabelled Gaussian `mu` random-intercept SD formulas	first slice	“Which scale are you modelling?”
fixed-effect non-Gaussian families	stable for named fixed-effect paths	“Choosing response families”
fixed-effect NB2 and zero-inflated NB2 count examples	stable for fixed effects	“Count abundance and extra zeros”
fixed-effect beta and beta-binomial proportion examples	stable for fixed effects	“Proportions and success rates”
Gaussian meta-analysis with known sampling covariance	stable	“Mean effects and residual heterogeneity in meta-analysis”
fixed-effect bivariate Gaussian `rho12`	stable	“Changing residual coupling with rho12”
ordinary bivariate covariance and `corpairs()` rows	first slice	“What can I fit today?”
phylogenetic Gaussian `mu`, q=2, q=4, and direct-SD covariance slices	first slices	“Structural dependence”
coordinate spatial Gaussian `mu` intercept and one numeric slope	first slice	“Structural dependence”
fitted interval status and derived-summary flags	first slice	“Checking and using fitted models”

After fitting any model, run check_drm() before interpreting the estimates:

check_drm(fit)

The diagnostic table flags convergence, finite objective values, Hessian status, fixed-effect standard errors, dropped rows, positive scale values, near-zero random-effect SDs, residual rho12 boundaries, Student-t nu boundary behaviour, known sampling covariance summaries, random-effect replication, and weak random-slope designs. A note marks something to inspect; a warning or error should be resolved before treating estimates as stable.

The key rule is to keep correlation layers separate. A bivariate residual rho12 is within-observation coupling after the two means and residual SDs are modelled. A random-effect correlation is a group-level quantity. Phylogenetic rows from corpairs(..., level = "phylogenetic") are structured-effect quantities. Coordinate spatial fits currently report field SDs and conditional modes; richer spatial correlation rows remain planned. Those layers should not be reported as if they were the same estimand.