Skip to contents

Six recurring mistakes that look like bugs but aren’t, plus the one-line fix for each. Each pitfall starts with the symptom, then the diagnosis, a short reproducible chunk, and a rule of thumb.

1. Factor level order on trait

Symptom. Fitted 𝚺̂unit\hat{\boldsymbol\Sigma}_{\text{unit}} rows don’t line up with the rows of your true Lambda.

Diagnosis. factor(rep(traits, n)) defaults to sorting levels alphabetically. If your traits vector is c("length", "mass", "wing"), the model sees levels c("length", "mass", "wing"); if it is c("wing", "mass", "length"), the model still sees alphabetical order. The model is correct; the row labels are out of sync with your truth matrix.

traits <- c("length", "mass", "wing", "tarsus", "bill")

# WRONG: alphabetical levels (bill, length, mass, tarsus, wing)
levels(factor(rep(traits, 3)))
#> [1] "bill"   "length" "mass"   "tarsus" "wing"

# RIGHT: pin levels to your declared order
levels(factor(rep(traits, 3), levels = traits))
#> [1] "length" "mass"   "wing"   "tarsus" "bill"

Rule of thumb. If you defined a true Lambda with named rows, always pass levels = <your trait vector> when building the long-format factor. Otherwise the recovery comparison is meaningless even when the fit is right.

2. What extract_Sigma(level = "unit") actually returns

Symptom. Diagonals of extract_Sigma(fit, level = "unit")$Sigma look “underestimated” by exactly σε2\sigma^2_\varepsilon.

Diagnosis. extract_Sigma(level = "unit") returns the latent()-implied between-unit covariance 𝚲𝚲\boldsymbol\Lambda \boldsymbol\Lambda^{\!\top} (plus the optional unique(0 + trait | site) variance), not the marginal observational variance Var(yit)=𝚲𝚲+σε2\mathrm{Var}(y_{it}) = \boldsymbol\Lambda \boldsymbol\Lambda^{\!\top} + \sigma^2_\varepsilon. The “missing” amount is real residual noise, not bias.

set.seed(1)
sim <- simulate_site_trait(
  n_sites = 60, n_species = 1, n_traits = 3,
  mean_species_per_site = 1,
  Lambda_B = matrix(c(1.0, 0.4, -0.3,
                      0.0, 0.8,  0.5), 3, 2),
  S_B  = rep(0, 3), beta = matrix(0, 3, 2), seed = 1
)
fit <- gllvmTMB(value ~ 0 + trait + latent(0 + trait | site, d = 2),
                data = sim$data)

S_hat    <- extract_Sigma(fit, level = "unit")$Sigma
LLt_true <- tcrossprod(sim$truth$Lambda_B)
# Compare to LL^T (the right target) - not to LL^T + sigma^2 I
round(rbind(diag_LLt_true  = diag(LLt_true),
            diag_Sigma_hat = diag(S_hat)), 2)
#>                trait_1 trait_2 trait_3
#> diag_LLt_true     1.00    0.80    0.34
#> diag_Sigma_hat    1.03    0.66    0.58

Rule of thumb. Compare extract_Sigma(level = "unit") to 𝚲𝚲\boldsymbol\Lambda \boldsymbol\Lambda^{\!\top}. If you want the marginal Var(y) including residual noise, add + unique(0 + trait | site) to the formula and inspect extract_Sigma(level = "unit", part = "total")$Sigma — the residual sigma_eps is reported separately in fit$report$sigma_eps.

3. The DGP must match the fitted model

Symptom. Simulation recovery shows large positive bias on 𝚺̂unit\hat{\boldsymbol\Sigma}_{\text{unit}} diagonals, even with hundreds of sites.

Diagnosis. simulate_site_trait() ships with sensible defaults that include random environmental fixed effects (βtk𝒩(0,0.52)\beta_{tk} \sim \mathcal{N}(0, 0.5^2)) and a per-trait diagonal SBS_B. If your fitted model has no environmental term and no unique(0 + trait | site), that variance has nowhere to go and gets absorbed into 𝚲̂B\hat{\boldsymbol\Lambda}_B. This was a real harness bug caught during simulation work — see the simulation-recovery article.

# CLEAN latent()-only DGP: explicitly switch off beta and S_B
sim_clean <- simulate_site_trait(
  n_sites = 50, n_species = 1, n_traits = 3,
  mean_species_per_site = 1,
  Lambda_B = matrix(c(1.0, 0.4, -0.3,
                      0.0, 0.8,  0.5), 3, 2),
  beta = matrix(0, 3, 2),   # <-- no env effects
  S_B  = rep(0, 3),         # <-- no diagonal trait variance
  seed = 1
)

Rule of thumb. When simulating to test recovery of a latent()-only fit, set beta = matrix(0, n_traits, n_predictors) and S_B = rep(0, n_traits). If you want those components in the DGP, include them in the formula too: + (0 + trait):env_1 and/or + unique(0 + trait | site).

4. Lambda is not unique; Sigma is

Symptom. Two gllvmTMB fits on identical data return different 𝚲̂\hat{\boldsymbol\Lambda} matrices.

Diagnosis. Reduced-rank factor models have a rotational ambiguity: 𝚲\boldsymbol\Lambda and 𝚲𝐐\boldsymbol\Lambda \mathbf{Q} yield the same 𝚺unit\boldsymbol\Sigma_{\text{unit}} for any orthogonal 𝐐\mathbf{Q}. The implied covariance is identifiable; the loadings alone are not. getLoadings() will surface a one-shot hint when you access raw 𝚲\boldsymbol\Lambda on an unconstrained rank->1>1 fit.

# Sigma_unit is rotation-invariant
S_hat <- extract_Sigma(fit, level = "unit")$Sigma
round(S_hat, 2)
#>         trait_1 trait_2 trait_3
#> trait_1    1.03    0.19   -0.40
#> trait_2    0.19    0.66    0.44
#> trait_3   -0.40    0.44    0.58

# Varimax gives a unique post-hoc rotation for interpretation
rot <- rotate_loadings(fit, level = "unit", method = "varimax")
round(rot$Lambda, 2)
#>          [,1] [,2]
#> trait_1  1.01 0.02
#> trait_2  0.17 0.80
#> trait_3 -0.41 0.64

# Or ask the package to suggest a constraint matrix to refit with
sug <- suggest_lambda_constraint(
  value ~ 0 + trait + latent(0 + trait | site, d = 2),
  data = sim$data, convention = "lower_triangular"
)
sug$constraint
#>         f1 f2
#> trait_1 NA  0
#> trait_2 NA NA
#> trait_3 NA NA

Rule of thumb. Use extract_Sigma(level = "unit") (rotation-invariant) for covariances and correlations. Use rotate_loadings(fit, method = "varimax") for a unique post-hoc rotation. Use suggest_lambda_constraint() plus a refit only if you need a specific identifiable 𝚲\boldsymbol\Lambda for confirmatory testing — see the lambda-constraint article for the full decision tree.

5. phylo_scalar() vs phylo_latent()

Symptom. phylo_scalar() recovery returns a lam_phy that looks several times smaller than the true per-trait phylogenetic variance you simulated.

Diagnosis. phylo_scalar() (the propto-style term) fits a single shared scalar λ\lambda multiplying a free trait correlation matrix. Only the product λρtt\lambda \cdot \rho_{tt'} is identifiable — if your data have trait-specific phylogenetic variances, the fitted scalar can absorb part of that into the correlation matrix and look small on its own. phylo_latent(species, d = K) instead estimates a T×KT \times K phylogenetic loading matrix, giving per-trait phylogenetic effects.

# Tiny tree-based example (only the formulae matter here)
set.seed(1)
tree <- ape::rcoal(15)
tree$tip.label <- paste0("sp", seq_len(15))

# Use phylo_latent (recommended) when traits have their own phylo signal:
#   value ~ 0 + trait + phylo_latent(species, d = 2)
# Use phylo_scalar() (single-scalar) only when one shared phylo factor is
# enough and trait correlations are the inferential target:
#   value ~ 0 + trait + phylo_scalar(species)

Rule of thumb. Use phylo_latent(species, d = K) (with KK up to n_traits) whenever you expect trait-specific phylogenetic effects. Reserve phylo_scalar() for the one-scalar case where the trait correlation matrix is the target. See phylogenetic-gllvm.html for full examples.

6. The unit = "..." argument

Symptom. Error: Column site not found in data.

Diagnosis. gllvmTMB() defaults to unit = "site", but in morphometric studies the grouping is often individual; in meta-analyses it might be study_id. The engine errors when the literal column site is not present.

# Rename to mimic a morphometric dataset
df <- sim$data
df$individual <- df$site
df$site <- NULL

# WRONG (uncomment to see the error):
# gllvmTMB(value ~ 0 + trait + latent(0 + trait | individual, d = 2), data = df)
# #> Error: Column site not found in data

# RIGHT: tell the engine which column groups observations
fit_ind <- gllvmTMB(
  value ~ 0 + trait + latent(0 + trait | individual, d = 2),
  data = df, unit = "individual"
)
fit_ind$unit_col
#> [1] "individual"

Rule of thumb. If your grouping column is not literally named site, pass unit = "<your_column>" to gllvmTMB(). The morphometrics article shows this in the canonical individual × trait setup.

See also