Proportions and success rates
Source:vignettes/proportion-beta-binomial.Rmd
proportion-beta-binomial.RmdProportion data are common in ecology and evolution, but the measurement process matters. Some proportions are counts of successes out of known trials: germinated seeds out of seeds planted, infected hosts out of hosts checked, or occupied quadrats out of quadrats surveyed. Other proportions are continuous rates already measured on the unit interval, such as leaf damage, vegetation cover, or the fraction of time an animal spends foraging.
This article stays inside the currently implemented
drmTMB bounded-response surface:
- use
beta_binomial()for counted successes out of known trials; - use
beta()for continuous proportions strictly inside(0, 1); - keep structural zeros and ones as planned neighbours unless the current family already represents them through binomial sampling.
The source motivation comes from Nakagawa et al. (2026), who treat overdispersed proportions as part of the same location-scale framework as continuous and count responses. They distinguish discrete proportions from continuous proportions, recommend beta-binomial models for successes out of trials when extra-binomial variation is present, and warn that exact 0 or 1 values in continuous beta responses need explicit zero-one processes rather than being hidden by the model.
Choose The Proportion Route
Start from how the response was measured:
| Response | Example | First drmTMB family |
Boundary status |
|---|---|---|---|
| Counted successes out of trials | germinated seeds out of planted seeds | beta_binomial() |
successes can be 0 or all trials through the binomial count process |
Continuous proportion strictly inside (0, 1)
|
leaf-area damage, vegetation cover, foraging time fraction | beta() |
exact 0 or 1 is not supported by the strict beta likelihood |
| Continuous proportion with structural 0 or 1 values | 0% disease, 100% cover, complete absence or saturation | planned zero-one-inflated beta or ordered beta | not a fitted drmTMB tutorial path yet |
The distinction is not cosmetic. A denominator-aware model knows that 3 successes out of 6 trials and 30 successes out of 60 trials carry different sampling information. A strict beta model assumes the observation is already a continuous rate and that the boundaries are not part of the observed support.
Beta-Binomial Equation And Syntax
For successes out of known trials, beta_binomial()
uses
The matching drmTMB syntax is:
drmTMB(
bf(cbind(germinated, failed) ~ treatment + moisture, sigma ~ treatment),
family = beta_binomial(),
data = seed_trials
)Read each parameter before interpreting the fitted model:
| Symbol or syntax | Meaning | In the seed-germination example |
|---|---|---|
,
germinated
|
number of successes | seeds that germinated in tray |
,
germinated + failed
|
known number of trials | viable seeds planted in tray |
,
mu
|
expected success probability | germination probability for a seed in that tray |
| location coefficients on the logit probability scale | treatment and moisture effects on germination probability | |
,
sigma
|
extra-binomial scale | tray-to-tray probability variation beyond binomial sampling |
| beta precision |
phi_i = 1 / sigma_i^2, so larger sigma
means lower precision |
|
scale coefficients on the log sigma scale |
treatment effects on extra-binomial variation |
If gamma_1 is the sheltered-treatment coefficient,
then
That second ratio reverses direction because
.
In drmTMB, report sigma when you want the
direction to mean more or less modelled variation.
Seed Germination Example
Suppose seeds are planted in trays under open and sheltered microsites during a restoration trial. Shelter may increase average germination by reducing heat stress, and it may also make tray outcomes more predictable by buffering microclimate. Each tray has a known number of seeds, so the denominator belongs in the model.
This transparent simulation gives us known structure before fitting:
set.seed(197)
n <- 360
seed_trials <- data.frame(
treatment = factor(
rep(c("open", "sheltered"), each = n / 2),
levels = c("open", "sheltered")
),
moisture = as.numeric(scale(runif(n, 0.1, 0.9))),
trials = sample(18:32, n, replace = TRUE)
)
sheltered <- as.numeric(seed_trials$treatment == "sheltered")
mu_seed <- plogis(-0.55 + 0.70 * sheltered + 0.45 * seed_trials$moisture)
sigma_seed <- exp(-1.15 - 0.35 * sheltered)
phi_seed <- 1 / sigma_seed^2
tray_probability <- rbeta(
n,
shape1 = mu_seed * phi_seed,
shape2 = (1 - mu_seed) * phi_seed
)
seed_trials$germinated <- rbinom(n, size = seed_trials$trials, prob = tray_probability)
seed_trials$failed <- seed_trials$trials - seed_trials$germinated
head(seed_trials)
#> treatment moisture trials germinated failed
#> 1 open 0.79595177 24 4 20
#> 2 open -0.09180242 19 9 10
#> 3 open -1.42799682 29 6 23
#> 4 open -1.54868848 21 3 18
#> 5 open 1.65321214 30 20 10
#> 6 open 0.27586227 20 3 17Fit the beta-binomial location-scale model:
fit_seed <- drmTMB(
bf(cbind(germinated, failed) ~ treatment + moisture, sigma ~ treatment),
family = beta_binomial(),
data = seed_trials
)Run diagnostics before interpreting coefficients:
check_drm(fit_seed)
#> <drm_check: 10 checks>
#> ok: 9; notes: 0; warnings: 1; errors: 0
#> check status
#> optimizer_convergence ok
#> optimizer_budget ok
#> finite_objective ok
#> fixed_gradient warning
#> sdreport_status ok
#> hessian_positive_definite ok
#> standard_errors_finite ok
#> dropped_rows ok
#> positive_scale ok
#> fixed_effect_design_size ok
#> value
#> 0
#> iterations=26; function=37; gradient=27
#> 991.7
#> max=0.001523; component=beta_mu[1]
#> ok
#> TRUE
#> range=[0.03677,0.1156]
#> nobs=360; dropped=0
#> min=0.2520
#> total_mb=0.06031; max_cols=3; largest=mu; largest_class=matrix; largest_density=0.8333
#> message
#> nlminb convergence code is 0.
#> Optimizer evaluation counts recorded; no eval.max or iter.max control was supplied.
#> Objective and log-likelihood are finite.
#> Maximum absolute fixed gradient is > 0.001; largest component is beta_mu[1].
#> TMB::sdreport() completed successfully.
#> sdreport reports a positive-definite Hessian.
#> All fixed-effect standard errors are finite.
#> No rows were dropped by model-frame or known-covariance filtering.
#> All fitted scale values are finite and positive.
#> Dense fixed-effect design matrices are modest for this fit.The mu coefficients describe the log-odds of
germination. The sigma coefficients describe extra-binomial
variation in the latent tray-level success probability:
coef(fit_seed, "mu")
#> (Intercept) treatmentsheltered moisture
#> -0.5127846 0.6006894 0.4683619
coef(fit_seed, "sigma")
#> (Intercept) treatmentsheltered
#> -1.1893314 -0.1889192
sigma_ratio <- exp(coef(fit_seed, "sigma")["treatmentsheltered"])
c(
sigma_ratio_sheltered_vs_open = sigma_ratio,
precision_ratio_sheltered_vs_open = sigma_ratio^(-2)
)
#> sigma_ratio_sheltered_vs_open.treatmentsheltered
#> 0.8278534
#> precision_ratio_sheltered_vs_open.treatmentsheltered
#> 1.4591272For a response-scale summary, predict the expected success probability and the proportion-level standard deviation for trays with the same number of seeds:
new_seed_trays <- data.frame(
treatment = factor(c("open", "sheltered"), levels = levels(seed_trials$treatment)),
moisture = c(0, 0),
trials = c(24, 24)
)
mu_hat <- predict(fit_seed, newdata = new_seed_trays, dpar = "mu")
sigma_hat <- predict(fit_seed, newdata = new_seed_trays, dpar = "sigma")
prop_var <- mu_hat * (1 - mu_hat) *
(1 + new_seed_trays$trials * sigma_hat^2) /
(new_seed_trays$trials * (1 + sigma_hat^2))
data.frame(
treatment = new_seed_trays$treatment,
trials = new_seed_trays$trials,
expected_probability = mu_hat,
expected_successes = new_seed_trays$trials * mu_hat,
sigma = sigma_hat,
phi = 1 / sigma_hat^2,
proportion_sd = sqrt(prop_var)
)
#> treatment trials expected_probability expected_successes sigma phi
#> 1 open 24 0.374541 8.988983 0.3044247 10.79047
#> 2 sheltered 24 0.521962 12.527089 0.2520190 15.74466
#> proportion_sd
#> 1 0.1697103
#> 2 0.1570892This table keeps the two parts of the question separate: expected germination probability and extra-binomial scatter around that probability.
Strict Continuous Proportions
Use beta() when the response is a continuous proportion
strictly inside the open interval. For example, vegetation cover
measured from image analysis may record 0.03, 0.41, or 0.82 rather than
successes out of a known number of quadrats. The beta model uses
The fitted syntax is the same one-formula-per-parameter pattern:
A small transparent example:
set.seed(198)
n_cover <- 300
cover_data <- data.frame(
grazing = factor(
rep(c("ungrazed", "grazed"), each = n_cover / 2),
levels = c("ungrazed", "grazed")
),
moisture = as.numeric(scale(runif(n_cover, 0.05, 0.95)))
)
grazed <- as.numeric(cover_data$grazing == "grazed")
mu_cover <- plogis(0.35 - 0.75 * grazed + 0.35 * cover_data$moisture)
sigma_cover <- exp(-1.10 + 0.45 * grazed)
phi_cover <- 1 / sigma_cover^2
cover_data$cover <- rbeta(
n_cover,
shape1 = mu_cover * phi_cover,
shape2 = (1 - mu_cover) * phi_cover
)
fit_cover <- drmTMB(
bf(cover ~ grazing + moisture, sigma ~ grazing),
family = beta(),
data = cover_data
)
check_drm(fit_cover)
#> <drm_check: 10 checks>
#> ok: 10; notes: 0; warnings: 0; errors: 0
#> check status
#> optimizer_convergence ok
#> optimizer_budget ok
#> finite_objective ok
#> fixed_gradient ok
#> sdreport_status ok
#> hessian_positive_definite ok
#> standard_errors_finite ok
#> dropped_rows ok
#> positive_scale ok
#> fixed_effect_design_size ok
#> value
#> 0
#> iterations=28; function=38; gradient=29
#> -93.38
#> max=0.0007654; component=beta_mu[1]
#> ok
#> TRUE
#> range=[0.04484,0.09263]
#> nobs=300; dropped=0
#> min=0.3531
#> total_mb=0.05065; max_cols=3; largest=mu; largest_class=matrix; largest_density=0.8333
#> message
#> nlminb convergence code is 0.
#> Optimizer evaluation counts recorded; no eval.max or iter.max control was supplied.
#> Objective and log-likelihood are finite.
#> Maximum absolute fixed gradient is <= 0.001; largest component is beta_mu[1].
#> TMB::sdreport() completed successfully.
#> sdreport reports a positive-definite Hessian.
#> All fixed-effect standard errors are finite.
#> No rows were dropped by model-frame or known-covariance filtering.
#> All fitted scale values are finite and positive.
#> Dense fixed-effect design matrices are modest for this fit.Here, grazing changes both expected cover and the variation of cover among sampling units:
coef(fit_cover, "mu")
#> (Intercept) grazinggrazed moisture
#> 0.3732633 -0.7837726 0.3590303
coef(fit_cover, "sigma")
#> (Intercept) grazinggrazed
#> -1.0410602 0.3442394
new_cover <- data.frame(
grazing = factor(c("ungrazed", "grazed"), levels = levels(cover_data$grazing)),
moisture = c(0, 0)
)
mu_cover_hat <- predict(fit_cover, newdata = new_cover, dpar = "mu")
sigma_cover_hat <- predict(fit_cover, newdata = new_cover, dpar = "sigma")
data.frame(
grazing = new_cover$grazing,
expected_cover = mu_cover_hat,
sigma = sigma_cover_hat,
phi = 1 / sigma_cover_hat^2,
cover_sd = sqrt(
mu_cover_hat * (1 - mu_cover_hat) *
sigma_cover_hat^2 / (1 + sigma_cover_hat^2)
)
)
#> grazing expected_cover sigma phi cover_sd
#> 1 ungrazed 0.5922473 0.3530802 8.021459 0.1636107
#> 2 grazed 0.3987900 0.4981666 4.029497 0.2183348Do not use this strict beta route if the observed response includes exact 0 or 1 values. A sprayed plot with 0% damage or a quadrat with 100% cover is a boundary-generating process, not an interior beta observation. Those cases need a future zero-one-inflated beta or ordered-beta route, or a deliberate measurement-process decision outside this tutorial.
Current Boundary
The implemented bounded-response path is fixed-effect and univariate.
Use beta_binomial() for counted successes and failures, and
use beta() for a single continuous proportion that is
strictly between 0 and 1:
drmTMB(
bf(cbind(successes, failures) ~ treatment, sigma ~ treatment),
family = beta_binomial(),
data = dat
)Do not teach the following as fitted proportion examples yet:
- random effects in
muorsigma; -
sd(group) ~ ...random-effect scale models; -
meta_V(V = V), or itsmeta_known_V(V = V)compatibility alias, with beta or beta-binomial responses; -
phylo(),spatial(),animal(), orrelmat()bounded-response models; - bivariate or mixed-response bounded models such as
family = c(beta(), gaussian()); - fixed-effect
zoiorcoilikelihoods, zero-one-inflated beta, ordered beta, or beta-binomial zero-inflation; - denominator shorthand such as
successes / trialsas a replacement forcbind(successes, failures).
Those are useful future routes, but each needs likelihood code, simulation recovery, diagnostics, and source-map evidence before it becomes tutorial syntax.