This article describes how drmTMB likelihoods should be
checked before they are treated as implemented. It is written for
package contributors: the goal is to make each model defensible by
matching the symbolic model, the R syntax, the TMB likelihood, and the
tests.
The planning reference is
docs/design/05-testing-strategy.md. This article is the
practical version: what to put in a pull request or phase before asking
users to trust a model.
In this guide, location means a mean-like parameter such as
mu; scale means a residual spread or dispersion-like
parameter such as sigma; shape means a family-specific
parameter such as Student-t nu; and coscale means residual
coupling, represented in the bivariate Gaussian model by
rho12.
The contract
Every fitted model should have the same four-part contract:
- a symbolic model that defines the likelihood;
- the
drmTMBsyntax that represents that model; - an independent likelihood, simulation, or comparator check;
- a clear statement of what is still unsupported.
For example, the fixed-effect Gaussian location-scale model is:
y_i | mu_i, sigma_i ~ Normal(mu_i, sigma_i^2)
mu_i = beta_0 + beta_1 x_i
log(sigma_i) = gamma_0 + gamma_1 x_i
The matching user syntax is:
fit <- drmTMB(
drm_formula(
y ~ x,
sigma ~ x
),
family = gaussian(),
data = dat
)The test should then simulate from the same equations, fit the same
model, and check recovery of beta and gamma.
If the equation, syntax, or TMB template changes, the test should change
in the same commit.
Tier 1: comparator checks
Comparator checks compare drmTMB with established
software in an overlapping case. They are not enough by themselves,
because no other package has exactly the whole drmTMB
grammar. They are still powerful: they catch likelihood constants, scale
definitions, covariance assembly, and optimizer plumbing.
The simplest Gaussian mixed-model comparator is:
y_ij | b_j, sigma ~ Normal(beta_0 + beta_1 x_ij + b_j, sigma^2)
b_j ~ Normal(0, sd_id^2)
with syntax:
fit <- drmTMB(
drm_formula(y ~ x + (1 | id)),
family = gaussian(),
data = dat
)
fit_lme4 <- lme4::lmer(
y ~ x + (1 | id),
data = dat,
REML = FALSE
)The comparator test should check fixed effects, random-effect
standard deviations, residual scale, and logLik(). Checking
the log likelihood matters because equal coefficients can hide a wrong
likelihood constant or covariance term.
Meta-analysis gives another useful comparator. The implemented dense
known-V Gaussian model is:
y ~ Normal(X beta, V + sigma^2 I)
where V is known sampling covariance and
sigma^2 is residual heterogeneity. The matching
drmTMB syntax is:
fit <- drmTMB(
drm_formula(yi ~ x + meta_known_V(V = V)),
family = gaussian(),
data = dat
)For the overlapping model, the comparator is:
fit_metafor <- metafor::rma.mv(
yi = yi,
V = V,
mods = ~ x,
random = ~ 1 | obs,
data = dat,
method = "ML"
)Here obs has one level per row, so metafor
estimates one constant observation-level heterogeneity variance. The
corresponding drmTMB test should compare:
expect_equal(unname(coef(fit, "mu")),
unname(stats::coef(fit_metafor)),
tolerance = 1e-4)
expect_equal(stats::sigma(fit)[[1L]]^2,
fit_metafor$sigma2[[1L]],
tolerance = 1e-4)
expect_equal(as.numeric(stats::logLik(fit)),
as.numeric(stats::logLik(fit_metafor)),
tolerance = 1e-4)Optional comparator packages should use
testthat::skip_if_not_installed() so routine checks stay
portable.
Count models also need comparator checks when another package has the
same parameterization. For the overlapping negative-binomial 2 case with
constant overdispersion, MASS::glm.nb() estimates
theta = size, whereas drmTMB uses
sigma = 1 / sqrt(size):
y_i ~ NB2(mu_i, size)
log(mu_i) = beta_0 + beta_1 x_i
sigma = 1 / sqrt(size)
fit <- drmTMB(
drm_formula(count ~ x, sigma ~ 1),
family = nbinom2(),
data = dat
)
fit_mass <- MASS::glm.nb(count ~ x, data = dat)The comparator should check the mu coefficients, the
transformed overdispersion scale, and logLik():
expect_equal(unname(coef(fit, "mu")),
unname(stats::coef(fit_mass)),
tolerance = 1e-4)
expect_equal(sigma(fit)[[1L]],
1 / sqrt(fit_mass$theta),
tolerance = 1e-4)
expect_equal(as.numeric(stats::logLik(fit)),
as.numeric(stats::logLik(fit_mass)),
tolerance = 1e-4)For beta mean-scale models, the always-available comparator is the
base R stats::dbeta() density at the fitted coefficients.
The public sigma parameter maps to precision as
phi = 1 / sigma^2:
logit(mu_i) = X_mu[i, ] beta_mu
log(sigma_i) = X_sigma[i, ] beta_sigma
phi_i = 1 / sigma_i^2
alpha_i = mu_i phi_i
beta_i = (1 - mu_i) phi_i
eta_mu <- as.vector(fit$model$X$mu %*% coef(fit, "mu"))
eta_sigma <- as.vector(fit$model$X$sigma %*% coef(fit, "sigma"))
mu <- plogis(eta_mu)
sigma <- exp(eta_sigma)
phi <- 1 / sigma^2
ll_independent <- sum(stats::dbeta(
fit$model$y,
shape1 = mu * phi,
shape2 = (1 - mu) * phi,
log = TRUE
))
expect_equal(as.numeric(logLik(fit)), ll_independent, tolerance = 1e-6)Tier 2: simulation recovery
Simulation recovery is the main internal validation source for new
drmTMB features. It should be used whenever a model has no
exact comparator, or whenever the package adds a distributional
parameter that other software does not model in the same way.
A simulation recovery test should:
- choose known parameters on the link scale;
- simulate from the symbolic model;
- fit the matching
drmTMBmodel; - check optimizer convergence;
- check estimates on the scale that is most meaningful for the parameter;
- include at least one edge or malformed-input case nearby.
For the Student-t location-scale-shape model, the implemented equations are:
y_i | mu_i, sigma_i, nu_i ~ Student-t(mu_i, sigma_i, nu_i)
mu_i = beta_0 + beta_1 x_i
log(sigma_i) = gamma_0 + gamma_1 x_i
nu_i = 2 + exp(delta_0 + delta_1 x_i)
The matching syntax is:
fit <- drmTMB(
drm_formula(
y ~ x,
sigma ~ x,
nu ~ x
),
family = student(),
data = dat
)The test should check that nu stays above 2, because the
current parameterization makes the residual variance finite. For a
fitted Student-t model, sigma is a scale parameter; the
residual standard deviation is sigma * sqrt(nu / (nu - 2))
when nu > 2.
Independent likelihood checks
When the density is simple enough, add a likelihood check that does not call the TMB objective. This is especially useful for the first implementation of a family. For the Student-t model above, an independent R calculation has the form:
eta_mu <- X_mu %*% beta_mu
eta_sigma <- X_sigma %*% beta_sigma
eta_nu <- X_nu %*% beta_nu
mu <- as.vector(eta_mu)
sigma <- exp(as.vector(eta_sigma))
nu <- 2 + exp(as.vector(eta_nu))
loglik <- sum(
stats::dt((y - mu) / sigma, df = nu, log = TRUE) - log(sigma)
)The test should compare logLik(fit) with
loglik at the fitted parameter values. This catches
transform mistakes such as forgetting the Jacobian-like
-log(sigma) density adjustment for scaled Student-t
residuals.
For zero-truncated NB2, the independent likelihood must include the normalizing probability that the untruncated NB2 count component is positive:
y_i | y_i > 0, mu_i, sigma_i ~ truncated NB2(mu_i, sigma_i)
Pr_trunc(y_i) = Pr_NB2(y_i) / (1 - Pr_NB2(0))
eta_mu <- X_mu %*% beta_mu
eta_sigma <- X_sigma %*% beta_sigma
mu <- exp(as.vector(eta_mu))
sigma <- exp(as.vector(eta_sigma))
size <- 1 / sigma^2
log_p0 <- stats::dnbinom(0, mu = mu, size = size, log = TRUE)
log_positive <- log1p(-exp(log_p0))
loglik <- sum(
stats::dnbinom(y, mu = mu, size = size, log = TRUE) - log_positive
)The matching test should also check that all responses are positive
integers, that zeros are rejected, that
predict(fit, dpar = "mu") returns the untruncated NB2
component mean, and that fitted(fit) returns
mu / (1 - Pr_NB2(0)).
For hurdle NB2, the independent likelihood separates the zero process from the positive truncated NB2 component:
Pr(y_i = 0) = hu_i
Pr(y_i = k > 0) = (1 - hu_i) Pr_NB2(k) / (1 - Pr_NB2(0))
eta_mu <- X_mu %*% beta_mu
eta_sigma <- X_sigma %*% beta_sigma
eta_hu <- X_hu %*% beta_hu
mu <- exp(as.vector(eta_mu))
sigma <- exp(as.vector(eta_sigma))
hu <- plogis(as.vector(eta_hu))
size <- 1 / sigma^2
log_p0 <- stats::dnbinom(0, mu = mu, size = size, log = TRUE)
log_positive <- log1p(-exp(log_p0))
loglik <- sum(ifelse(
y == 0,
log(hu),
log1p(-hu) + stats::dnbinom(y, mu = mu, size = size, log = TRUE) -
log_positive
))The matching test should also check that zeros are accepted only when
hu is present, that predict(fit, dpar = "hu")
returns a probability, and that fitted(fit) returns
(1 - hu) * mu / (1 - Pr_NB2(0)).
Bivariate and coscale checks
The first bivariate Gaussian location-coscale model uses:
[y1_i, y2_i]' ~ MVN([mu1_i, mu2_i]', Omega_i)
Omega_i =
[ sigma1_i^2 rho12_i sigma1_i sigma2_i ]
[ rho12_i sigma1_i sigma2_i sigma2_i^2 ]
eta_rho12_i = X_rho12[i, ] beta_rho12
rho12_i = tanh(eta_rho12_i)
The C++ bivariate Gaussian branch uses a tiny boundary guard around this transform during optimization. Likelihood tests should know that implementation detail, but tutorial equations should show the clean statistical transform.
The matching syntax is:
fit <- drmTMB(
drm_formula(
mu1 = y1 ~ x1 + x2,
mu2 = y2 ~ x1,
sigma1 = ~ x1 + x2,
sigma2 = ~ x1,
rho12 = ~ x1 + x2
),
family = c(gaussian(), gaussian()),
data = dat
)Tests for this model should include rho12 = 0, moderate
negative and positive rho12, high negative and positive
rho12 near +/-0.8, predictor-dependent
rho12, and unequal sigma1 and
sigma2.
For bivariate meta-analysis, keep two covariance objects separate:
S_i[1,2] = known within-study sampling covariance
Omega_i[1,2] = rho12_i sigma1_i sigma2_i
In the current implementation, bivariate
meta_known_V(V = V) must evaluate to a complete-row
2n by 2n row-paired covariance matrix, with
meta_vcov_bivariate() available as the helper. Conceptually
this stacked matrix contains the known within-study sampling covariance
blocks S_i. rho12 models the unknown residual
covariance component. A valid test must show that fitted
rho12 is not merely reproducing the known sampling
correlation.
Boundary and rejection tests
Likelihood tests should include unsupported syntax and boundary-prone cases. For early phases, rejecting a model clearly is a feature, because it protects users from syntax that looks plausible but has no fitted likelihood path.
Examples of rejection tests:
expect_snapshot(
{
drmTMB(
drm_formula(y ~ x + meta_known_V(V = V), nu ~ x),
family = student(),
data = dat
)
},
error = TRUE
)
expect_snapshot(
{
drmTMB(
drm_formula(
mu1 = y1 ~ x + (1 | id),
mu2 = y2 ~ x,
sigma1 = ~ 1,
sigma2 = ~ 1,
rho12 = ~ 1
),
family = c(gaussian(), gaussian()),
data = dat
)
},
error = TRUE
)The exact messages may change, but the rule should not: an unsupported combination should fail before optimization with an explanation of the missing feature.
Fast and long tests
CRAN-facing tests should be small and deterministic. They answer whether the likelihood is wired correctly for a few representative cases.
Long tests belong in optional scripts, scheduled CI, or a simulation study. They answer whether the estimator behaves well across sample sizes, effect sizes, mesh densities, tree sizes, boundary variances, and extreme correlations.
Use this split deliberately:
| Test type | Runs in routine checks | Purpose |
|---|---|---|
| Unit test | yes | parser, family, extractor, and error-message behaviour |
| Independent likelihood check | yes | density constants, transforms, covariance assembly |
| Small simulation recovery | yes | basic estimator recovery |
| Comparator smoke test | yes, with skip_if_not_installed()
|
parity with established software |
| Large simulation grid | no | operating characteristics across parameter space |
| Bootstrap or profile-likelihood coverage | no, until scheduled | uncertainty calibration |
Before closing a task
Before a likelihood task is complete, run the after-task audit. The compact checklist is:
- the symbolic equation, R syntax, and TMB implementation describe the same model;
- tests include either an independent likelihood check, a comparator check, or simulation recovery;
- at least one unsupported or malformed neighbouring case is checked;
-
docs/design/03-likelihoods.md,docs/design/05-testing-strategy.md, and relevant vignettes are synchronized; -
pkgdown::check_pkgdown()andpkgdown::build_site()pass for user-facing documentation changes; -
docs/dev-log/check-log.mdand an after-task report record the checks, what did not go smoothly, and what remains unsupported.
This discipline is deliberately slow at the end of each task. It is much faster than debugging a package whose equations, examples, tests, and documentation have drifted apart.