Skip to contents

Many ecological responses are counts: fledglings per nest, parasites per host, insects in a trap, or soil invertebrates in a quadrat. A Poisson model is the natural baseline, but it fixes the variance equal to the mean. The location-scale count question is sharper: do predictors change the expected abundance, the extra-Poisson variation around that abundance, or the chance of a separate structural zero?

This article stays inside the currently implemented drmTMB count surface: univariate NB2 models, fixed-effect zero-inflated NB2 models, and the first ordinary NB2 mu random-effect path. The worked example below remains fixed-effect so the mean, overdispersion, and structural-zero pieces are easy to read before adding group effects.

The source motivation comes from Nakagawa et al. (2026), who use location-scale models to discuss heteroscedasticity in continuous, count, and proportion data. Their count section highlights negative-binomial models for fledglings, insect colony size, parasites, and soil invertebrates, and it separates overdispersion from structural-zero processes. drmTMB uses the same scientific split, but reports the NB2 scale as public sigma rather than the native size or precision parameter often written as theta.

Model Equation And Syntax

For an overdispersed count model, nbinom2() uses

Yiμi,σiNB2(μi,sizei),log(μi)=log(Ei)+β0+β1restoredi+β2moisturei,log(σi)=γ0+γ1restoredi,sizei=1/σi2,E[Yi]=μi,Var(Yi)=μi+σi2μi2. \begin{aligned} Y_i \mid \mu_i, \sigma_i &\sim \operatorname{NB2}(\mu_i, \text{size}_i),\\ \log(\mu_i) &= \log(E_i) + \beta_0 + \beta_1 \text{restored}_i + \beta_2 \text{moisture}_i,\\ \log(\sigma_i) &= \gamma_0 + \gamma_1 \text{restored}_i,\\ \text{size}_i &= 1 / \sigma_i^2,\\ E[Y_i] &= \mu_i,\\ \operatorname{Var}(Y_i) &= \mu_i + \sigma_i^2\mu_i^2. \end{aligned}

The matching drmTMB syntax is:

drmTMB(
  bf(
    springtails ~ habitat + moisture + offset(log(trap_nights)),
    sigma ~ habitat
  ),
  family = nbinom2(),
  data = soil_counts
)

Read each parameter before interpreting the fitted model:

Symbol or syntax Meaning In the soil-invertebrate example
YiY_i, springtails observed non-negative integer count springtails found in trap ii
EiE_i, trap_nights sampling effort exposure number of nights the trap was active
offset(log(trap_nights)) forces expected count to scale with effort estimates abundance rate per trap night
μi\mu_i, mu expected count after accounting for effort expected springtail abundance in a trap
𝛃μ\boldsymbol{\beta}_{\mu} location coefficients on the log mean scale habitat and moisture effects on expected abundance
σi\sigma_i, sigma NB2 extra-Poisson scale variation beyond the Poisson expectation
𝛃σ\boldsymbol{\beta}_{\sigma} scale coefficients on the log sigma scale habitat effects on extra-Poisson variation
sizei\text{size}_i, theta in some papers native NB2 size or precision size_i = 1 / sigma_i^2, so larger sigma means smaller size

The sigma slope is a variability slope. If gamma_1 is the restored-habitat coefficient, then

σrestoredσdegraded=exp(γ1),σrestored2σdegraded2=exp(2γ1). \frac{\sigma_\text{restored}}{\sigma_\text{degraded}} = \exp(\gamma_1), \qquad \frac{\sigma_\text{restored}^2}{\sigma_\text{degraded}^2} = \exp(2\gamma_1).

If a paper reports the native NB2 parameter θi=1/σi2\theta_i = 1 / \sigma_i^2, the direction reverses:

θrestoredθdegraded=exp(2γ1). \frac{\theta_\text{restored}}{\theta_\text{degraded}} = \exp(-2\gamma_1).

That reversal is why the tutorial keeps saying sigma: in drmTMB, larger sigma always means more modelled variation for this family.

A Soil-Invertebrate Example

Suppose springtails are counted from soil traps in degraded and restored grassland plots. Restoration may increase average abundance, but restored plots may also be more patchy while litter and vegetation structure re-establish. A few bare-soil microsites may be true absences where springtails are not available to be trapped.

This transparent simulation gives us known structure before fitting:

set.seed(106)
n <- 360
soil_counts <- data.frame(
  habitat = factor(
    rep(c("degraded", "restored"), each = n / 2),
    levels = c("degraded", "restored")
  ),
  surface = factor(
    sample(c("litter", "bare"), n, replace = TRUE, prob = c(0.72, 0.28)),
    levels = c("litter", "bare")
  ),
  moisture = as.numeric(scale(runif(n, 0.15, 0.95))),
  trap_nights = sample(2:5, n, replace = TRUE)
)

restored <- as.numeric(soil_counts$habitat == "restored")
bare <- as.numeric(soil_counts$surface == "bare")

rate <- exp(log(2.4) + 0.35 * restored + 0.30 * soil_counts$moisture)
mu <- soil_counts$trap_nights * rate
sigma_nb2 <- exp(-0.75 + 0.40 * restored)
zi <- plogis(-3.2 + 1.6 * bare - 0.25 * restored)

structural_zero <- runif(n) < zi
soil_counts$springtails <- ifelse(
  structural_zero,
  0L,
  rnbinom(n, size = 1 / sigma_nb2^2, mu = mu)
)

head(soil_counts)
#>    habitat surface   moisture trap_nights springtails
#> 1 degraded  litter -1.1665372           3           1
#> 2 degraded    bare  1.5939065           4           4
#> 3 degraded  litter  0.4203317           5          14
#> 4 degraded  litter  0.6364855           5           7
#> 5 degraded  litter  1.0379937           4           5
#> 6 degraded  litter -1.4584601           2           3

Start with the NB2 location-scale model before adding a structural-zero submodel:

fit_nb2 <- drmTMB(
  bf(
    springtails ~ habitat + moisture + offset(log(trap_nights)),
    sigma ~ habitat
  ),
  family = nbinom2(),
  data = soil_counts
)

Run diagnostics before interpreting coefficients:

check_drm(fit_nb2)
#> <drm_check: 10 checks>
#> ok: 10; notes: 0; warnings: 0; errors: 0
#>                      check status
#>      optimizer_convergence     ok
#>           optimizer_budget     ok
#>           finite_objective     ok
#>             fixed_gradient     ok
#>            sdreport_status     ok
#>  hessian_positive_definite     ok
#>     standard_errors_finite     ok
#>               dropped_rows     ok
#>             positive_scale     ok
#>   fixed_effect_design_size     ok
#>                                                                                   value
#>                                                                                       0
#>                                                 iterations=17; function=29; gradient=17
#>                                                                                   1130.
#>                                                     max=0.0007836; component=beta_mu[3]
#>                                                                                      ok
#>                                                                                    TRUE
#>                                                                  range=[0.04334,0.1013]
#>                                                                     nobs=360; dropped=0
#>                                                                              min=0.6499
#>  total_mb=0.06026; max_cols=3; largest=mu; largest_class=matrix; largest_density=0.8333
#>                                                                              message
#>                                                        nlminb convergence code is 0.
#>  Optimizer evaluation counts recorded; no eval.max or iter.max control was supplied.
#>                                             Objective and log-likelihood are finite.
#>        Maximum absolute fixed gradient is <= 0.001; largest component is beta_mu[3].
#>                                              TMB::sdreport() completed successfully.
#>                                        sdreport reports a positive-definite Hessian.
#>                                         All fixed-effect standard errors are finite.
#>                   No rows were dropped by model-frame or known-covariance filtering.
#>                                     All fitted scale values are finite and positive.
#>                          Dense fixed-effect design matrices are modest for this fit.

The NB2 fit answers two questions. The mu coefficients describe abundance rates, because the offset has already accounted for trap effort. The sigma coefficients describe extra-Poisson variation in the count component:

coef(fit_nb2, "mu")
#>     (Intercept) habitatrestored        moisture 
#>       0.7603134       0.3511186       0.2422656
coef(fit_nb2, "sigma")
#>     (Intercept) habitatrestored 
#>      -0.4309409       0.2758195

sigma_ratio <- exp(coef(fit_nb2, "sigma")["habitatrestored"])
c(
  sigma_ratio_restored_vs_degraded = sigma_ratio,
  variance_multiplier_at_same_mu = sigma_ratio^2
)
#> sigma_ratio_restored_vs_degraded.habitatrestored 
#>                                         1.317610 
#>   variance_multiplier_at_same_mu.habitatrestored 
#>                                         1.736096

The second value is the multiplier on the quadratic extra-Poisson part σi2μi2\sigma_i^2\mu_i^2 when two traps have the same expected count. It is not the full variance ratio whenever μi\mu_i also changes, because NB2 variance contains both μi\mu_i and σi2μi2\sigma_i^2\mu_i^2.

When Zeros Are A Separate Process

The paper example of soil invertebrates in patchy habitats maps naturally to a zero-inflated count model. Bare-soil microsites can be true absences, while litter microsites can still produce ordinary sampling zeros from the NB2 count component. Add a zi formula when the biological question needs that split:

Pr(Yi=0)=πi+(1πi)PrNB2(0μi,σi),Pr(Yi=k>0)=(1πi)PrNB2(kμi,σi),logit(πi)=δ0+δ1barei,Var(Yi)=(1πi)(μi+σi2μi2)+πi(1πi)μi2. \begin{aligned} \Pr(Y_i = 0) &= \pi_i + (1 - \pi_i)\Pr_{\operatorname{NB2}}(0 \mid \mu_i,\sigma_i),\\ \Pr(Y_i = k > 0) &= (1 - \pi_i)\Pr_{\operatorname{NB2}}(k \mid \mu_i,\sigma_i),\\ \operatorname{logit}(\pi_i) &= \delta_0 + \delta_1 \text{bare}_i,\\ \operatorname{Var}(Y_i) &= (1 - \pi_i)(\mu_i + \sigma_i^2\mu_i^2) + \pi_i(1 - \pi_i)\mu_i^2. \end{aligned}

In this equation, πi\pi_i is the structural-zero probability. It is not a scale parameter and should not be interpreted as overdispersion.

fit_zinb2 <- drmTMB(
  bf(
    springtails ~ habitat + moisture + offset(log(trap_nights)),
    sigma ~ habitat,
    zi ~ surface
  ),
  family = nbinom2(),
  data = soil_counts
)

check_drm(fit_zinb2)
#> <drm_check: 10 checks>
#> ok: 10; notes: 0; warnings: 0; errors: 0
#>                      check status
#>      optimizer_convergence     ok
#>           optimizer_budget     ok
#>           finite_objective     ok
#>             fixed_gradient     ok
#>            sdreport_status     ok
#>  hessian_positive_definite     ok
#>     standard_errors_finite     ok
#>               dropped_rows     ok
#>             positive_scale     ok
#>   fixed_effect_design_size     ok
#>                                                                                   value
#>                                                                                       0
#>                                                 iterations=31; function=41; gradient=32
#>                                                                                   1105.
#>                                                 max=0.00004223; component=beta_sigma[2]
#>                                                                                      ok
#>                                                                                    TRUE
#>                                                                  range=[0.03636,0.4202]
#>                                                                     nobs=360; dropped=0
#>                                                                              min=0.4904
#>  total_mb=0.08898; max_cols=3; largest=mu; largest_class=matrix; largest_density=0.8333
#>                                                                              message
#>                                                        nlminb convergence code is 0.
#>  Optimizer evaluation counts recorded; no eval.max or iter.max control was supplied.
#>                                             Objective and log-likelihood are finite.
#>     Maximum absolute fixed gradient is <= 0.001; largest component is beta_sigma[2].
#>                                              TMB::sdreport() completed successfully.
#>                                        sdreport reports a positive-definite Hessian.
#>                                         All fixed-effect standard errors are finite.
#>                   No rows were dropped by model-frame or known-covariance filtering.
#>                                     All fitted scale values are finite and positive.
#>                          Dense fixed-effect design matrices are modest for this fit.

The zi coefficients are on the log-odds scale. Prediction gives the structural-zero probability on the response scale:

coef(fit_zinb2, "zi")
#> (Intercept) surfacebare 
#>   -2.786077    1.215586

zero_grid <- data.frame(
  habitat = factor(c("degraded", "degraded"), levels = levels(soil_counts$habitat)),
  surface = factor(c("litter", "bare"), levels = levels(soil_counts$surface)),
  moisture = 0,
  trap_nights = 3
)

data.frame(
  surface = zero_grid$surface,
  structural_zero_probability = predict(fit_zinb2, newdata = zero_grid, dpar = "zi")
)
#>   surface structural_zero_probability
#> 1  litter                  0.05808118
#> 2    bare                  0.17214634

For the fitted zero-inflated model, predict(fit_zinb2, dpar = "mu") returns the conditional NB2 count mean, sigma(fit_zinb2) returns the conditional NB2 overdispersion scale, and fitted(fit_zinb2) returns the unconditional response mean (1πi)μi(1 - \pi_i)\mu_i:

new_traps <- data.frame(
  habitat = factor(c("degraded", "restored"), levels = levels(soil_counts$habitat)),
  surface = factor(c("litter", "litter"), levels = levels(soil_counts$surface)),
  moisture = c(0, 0),
  trap_nights = c(3, 3)
)

mu_hat <- predict(fit_zinb2, newdata = new_traps, dpar = "mu")
sigma_hat <- predict(fit_zinb2, newdata = new_traps, dpar = "sigma")
zi_hat <- predict(fit_zinb2, newdata = new_traps, dpar = "zi")

data.frame(
  habitat = new_traps$habitat,
  conditional_mean = mu_hat,
  sigma = sigma_hat,
  structural_zero_probability = zi_hat,
  unconditional_mean = (1 - zi_hat) * mu_hat,
  unconditional_variance =
    (1 - zi_hat) * (mu_hat + sigma_hat^2 * mu_hat^2) +
      zi_hat * (1 - zi_hat) * mu_hat^2
)
#>    habitat conditional_mean     sigma structural_zero_probability
#> 1 degraded         6.924234 0.4903942                  0.05808118
#> 2 restored        10.124436 0.6257923                  0.05808118
#>   unconditional_mean unconditional_variance
#> 1           6.522067               20.00548
#> 2           9.536397               52.95495

Use AIC only as one check, not as a substitute for the design story. A zero-inflated model needs a plausible structural-zero process such as bare soil, unsuitable host tissue, unsurveyable habitat, or true absence.

AIC(fit_nb2, fit_zinb2)
#>           df      AIC
#> fit_nb2    5 2270.559
#> fit_zinb2  7 2223.058

Current Boundary

The implemented count path is still intentionally narrow. Fixed-effect Poisson, NB2, zero-inflated Poisson, zero-inflated NB2, zero-truncated NB2, and hurdle NB2 models are fitted. Ordinary non-zero-inflated Poisson and NB2 mu models can also fit unlabelled random intercepts and independent numeric random slopes such as (1 | site) and (0 + effort | site). The zero-inflated, hurdle, truncated, and scale-side count routes remain fixed-effect only.

drmTMB(
  bf(count ~ habitat + offset(log(effort)), sigma ~ habitat, zi ~ surface),
  family = nbinom2(),
  data = dat
)

Do not teach the following as fitted count examples yet:

  • correlated or labelled NB2 mu random-slope blocks;
  • random effects in NB2 sigma or zi;
  • random effects in zero-inflated Poisson, zero-inflated NB2, truncated NB2, or hurdle NB2 count components;
  • sd(group) ~ ... random-effect scale models;
  • meta_V(V = V) or meta_known_V(V = V) with counts;
  • phylo(), spatial(), animal(), or relmat() count models;
  • bivariate or mixed-response count models such as family = c(gaussian(), nbinom2());
  • COM-Poisson underdispersion models.

Those are useful future routes, but each needs its own likelihood path, simulation recovery, diagnostics, and source-map update before it becomes tutorial syntax.