pigauto 0.10.0.9000 (development)
New (opt-in): Pagel’s lambda Brownian-motion baseline
Adds first-pass support for Pagel’s lambda < 1 in the BM baseline. Spec: specs/2026-05-18-pagel-lambda-baseline-design.md (APPROVED).
New parameters:
fit_pigauto(..., lambda_mode = c("fixed_1", "estimate"))fit_baseline(..., lambda_mode = c("fixed_1", "estimate"))- kernel-layer
bm_impute_col(..., lambda)accepts numeric in[0, 1]OR the string"estimate"(in-house ML via REML profile + 11-point grid scan + local optim)
Default lambda_mode = "fixed_1" is bit-identical to v0.9.x — no silent behaviour change for existing callers.
What lambda < 1 does mathematically:
R(lambda) = lambda * R + (1 - lambda) * I
shrinks the off-diagonal of the phylogenetic correlation matrix toward zero. lambda = 1 is full BM kriging; lambda = 0 reduces the GLS predictor to the grand mean. Intermediate lambda allows partial phylogenetic shrinkage for traits with weak phylo signal.
No phylopars dependency on the lambda path. Both transform_tree_pagel (internal scaler with Pagel terminal-edge compensation) and ml_lambda_for_col (per-column ML) are in-house. When lambda_mode = "estimate", the dispatcher forces per-column BM (joint MVN / threshold-joint / OVR are gated off because phylopars model = "BM" is silently lambda = 1).
Known limitation (honest framing)
On the v0.9.4 BIEN h2h CI bench (n=2000, 30% MCAR, m=20):
- height_m: ML lambda_hat ≈ 0.95, z-RMSE 0.73 (close to RMSE-optimal)
- leaf_area: ML lambda_hat ≈ 0.83, z-RMSE 0.92 (slight regression vs lambda=1’s 0.89)
- seed_mass: ML lambda_hat ≈ 0.99, z-RMSE 0.60 (essentially unchanged)
- wood_density: ML lambda_hat ≈ 0.79, z-RMSE 0.79 (slight regression vs lambda=1’s 0.77)
- sla: ML lambda_hat ≈ 0.005 (boundary!), z-RMSE 0.98 (regresses from lambda=1’s 0.91)
The sla case is the well-known weak-phylo-signal pathology of ML for Pagel’s lambda: the profile likelihood becomes flat for small lambda when phylogenetic signal is weak, finite-sample noise dominates, and the point estimator can collapse to the boundary. The math is correct; the predictor is just suboptimal.
BACE’s MCMCglmm achieves z-RMSE ≈ 0.69 on sla because Bayesian model averaging integrates over the variance-component posterior rather than relying on a point ML estimate.
Practical guidance: keep lambda_mode = "fixed_1" (the default) unless you’ve verified on your data that ML lambda doesn’t pick degenerate boundary values. The spec’s hard target (“sla z-RMSE ≤ 0.75”) is not achieved by this release; it’s a v0.11 target via cross-validation lambda selection (spec to follow).
pigauto 0.9.4 (2026-05-18)
Wires the existing WorldClim bioclim covariate pipeline (shipped in v0.9.1.9005) into the BACE-compatible head-to-head CI bench for the BIEN plant dataset. Closes the gap surfaced by the v0.9.3 autoresearch sweep — at that scale, GNN architecture knobs were exhausted; the remaining improvement had to come from environmental covariates, not capacity.
Result
BIEN pooled per-trait log-RMSE on the h2h CI bench (n=2000 species, 30% MCAR, m=20 imputations):
baseline (full 19,109-pool, no cov, gate ON): 8.286
control (filtered 3,450-pool, no cov, gate OFF): 7.455
treatment (filtered pool, WC covariates, gate OFF): 7.236
Pool-filter effect = −0.83. Net covariate effect = −0.22. Combined gain vs v0.9.3 baseline: −1.05 (−12.7%).
Per-trait gates with covariates active:
height_m = 0.68 (raw |r| vs bioclim = 0.64)
leaf_area = 1.00 (raw |r| vs bioclim = 0.41)
wood_density = 0.40 (raw |r| vs bioclim = 0.16)
sla = 0 (raw |r| vs bioclim = 0.27; gate-closed)
seed_mass = 0 (raw |r| vs bioclim = 0.43; gate-closed)
Two of five traits (sla, seed_mass) remain gate-closed even with covariates active. sla is the expected case (low raw correlation). seed_mass is anomalous — its raw bioclim correlation is comparable to height_m’s, but the val-set gate calibration finds no benefit. Diagnosis to follow.
Changes
-
script/gha/_bace_compat.R: BIEN config now setscovariate_colsto the 38 bioclim columns (19 vars × median + IQR),phylo_signal_gate = FALSE, andexternal_cov_rdspointing to the new bioclim cache..run_bench_bace_compat()learned anexternal_cov_rdsjoin that cbinds covariates ontotraits_dfbefore subsetting and filters out species without complete bioclim coverage.cov_colsAND external-cov column names are removed from the default trait-subset so they don’t get masked + imputed as traits. - New cache:
useful/bace_data_snapshot/data/bien_worldclim_covariates.rds(19,109 species × 38 cols; 3,450 with complete coverage).
Negative results documented for the trail
The W-series sweep tested six tweaks on top of the W0 (treatment) baseline; all regressed by 0.09–0.12, except W6 which was tied within noise:
- W3 lambda_shrink 0.03 → 0.01 (+0.12)
- W4 lambda_gate 0.01 → 0.001 (+0.09; leaf_area gate collapsed)
- W5 median-only cov (drop 19 IQR cols) (+0.12)
- W6 phylo_signal_gate FALSE → TRUE (+0.01, tied)
- Run A: WorldClim + use_trait_attention=TRUE (+0.11)
W6’s tie shows the phylo_signal_gate = FALSE recommendation from specs/2026-04-24-worldclim-covariates-design.md §5.4 is unnecessary when WC covariates are active — the calibrator handles the gating either way. The config keeps FALSE for stability but the constraint is softer than the spec implied.
pigauto 0.9.3 (2026-05-18)
Small opt-in feature release from an autoresearch sweep on the BIEN plant bench. New: within-row cross-trait self-attention in the GNN (use_trait_attention, default FALSE). Eighteen knob-tweak experiments + three structural variants on BIEN n=2000 were run between v0.9.2 and v0.9.3; none improved BIEN pooled RMSE meaningfully over the default architecture. The cross-trait attention path is shipped as opt-in because it is structurally distinct and may help on datasets with strong within-row functional coupling that the joint MVN / threshold-joint baseline cannot capture; it did NOT help on BIEN (the baseline already encodes Σ for the dominant trait types).
New (opt-in)
-
fit_pigauto()andmulti_impute()acceptuse_trait_attention = FALSE(default),n_trait_heads = 2L,trait_embed_dim = 32L. WhenTRUE, the GNN’s encoder receives an additional pooled trait-context feature built by per-trait token embedding + one multi-head self-attention block over the trait sequence. Backward-compatible: saved fits without the field default toFALSEand reconstruct identically.
Documentation
- New article on the pkgdown site: “GNN architecture and the math behind pigauto” (
vignettes/gnn-architecture.Rmd). Documents the end-to-end data flow, the baseline dispatcher (BM / LP / joint MVN / threshold-joint / OVR), the transformer-block GNN with B2 rate-aware attention and the new B3 within-row cross-trait attention, the gate- and-calibrate safety mechanism, conformal prediction intervals, the three uncertainty quantification mechanisms, and the two-step Nakagawa & de Villemereuil (2019) tree-uncertainty workflow.
Autoresearch findings (BIEN n=2000)
Negative results, documented for the trail:
- More capacity hurts:
n_gnn_layers2→4,hidden_dim64→128,ffn_mult4→8,n_heads4→16, and all combos regress (+0.02 to +0.05 on pooled RMSE). - Less capacity hurts equally:
n_heads4→2,hidden_dim64→32, legacy attention path also regress. - Regularization tweaks regress:
dropout0.10→0.20,lambda_gate0.01→0.001,gate_cap0.8→1.0. - Refinement / corruption tweaks regress:
refine_steps8→16,corruption_rate0.55→0.30. - Within-row cross-trait attention (this release’s new feature) also regresses on BIEN at 32/64 embed and 2/4 heads — the joint MVN baseline already captures Σ for continuous traits.
Marginal positive: n_heads 4→8 in the CI bench config (-0.002, at-noise; bench-only).
Strategic implication: at BIEN’s scale and signal regime, further RMSE lift needs environmental covariates (WorldClim) or larger posterior tree ensembles, not GNN-architecture changes.
pigauto 0.9.2 (2026-05-17)
First release after the v0.9.1.9000 dev cycle. Headline outcome: on the BACE-compatible cross-dataset head-to-head bench (six phylogenetic-trait datasets, n=2000 per dataset, 30% MCAR, matched against daniel1noble/BACE’s snapshot results), pigauto’s net win-loss vs BACE went from 14–14 (commit 9d4782e, PR #102) to roughly 24–7 in this release. The change is driven by three bug fixes in the in-house Σ solver plus a CI evaluator alignment with BACE’s reporting policy; the underlying imputation algorithm is unchanged from v0.9.1.
Bug fixes: in-house Σ solver
fit_mvn_bm_inhouse() had three independent bugs that surfaced together on real phylogenetic data:
Henderson R⁻¹ scale mismatch.
henderson_R_inv_apply()was documented to return R⁻¹b but actually returned A⁻¹b (where A =vcv(tree), R =cov2cor(vcv(tree))). For variable-depth trees the mismatch is a per-tip rescale; for ultrametric trees it is a global scalar. Fixed by addingcor_scale = TRUEoption that pre-/post-multiplies bysqrt(diag(A)).henderson_R_inv_apply(..., cor_scale = TRUE)now matches densesolve(R)to ~1e-6 relative on non-ultrametric trees; the existingcor_scale = FALSE(A-scale) tests continue to pass unchanged.Σ init via species-iid sample covariance. The previous init used
cov(L, use = "pairwise.complete.obs")which treats species as iid and double-counts sister tips as independent observations. On strongly-phylo-conserved liability matrices this produced near-singular Σ̂ initialisations whose off-diagonals were biased toward zero. Replaced with the closed-form matrix-normal MLE Σ̂ = (1/n) L̂ᵀR⁻¹L̂ on per-column-BM-completed L̂ — properly credits R for the species correlation and recovers Σ̂ to the right order of magnitude.EM E-step diverged at near-sister tips.
build_conditional_prior()computes a within-row cross-trait conditional posterior but ignores R-mediated cross-row correlation. For near-sister tips, the proper joint posterior would strongly pull a missing tip’s value toward its observed sister; the within-row approximation cannot see that, and combined withhenderson_bm_predict()’s constant conservative variance for missing cells, the inverse-variance pool over-weights the wrong cross-trait prior. R⁻¹ at near-sister pairs (values of 4000–7500 observed at sister pairs on real AVONET) then amplifies the discrepancy and Σ̂ inflated 10–50× per EM iteration. Fixed by changing the defaultmax_iterto0Lso the solver returns the per-column BM L̂ + closed-form Σ̂ MLE without the broken cross-trait EM. Callers can still opt in to the EM viamax_iter > 0.
Net effect on bench: AVONET categorical accuracy lifted from 0.357 to 0.825 (trophic_level vs BACE 0.816) and 0.350 to 0.823 (primary_lifestyle vs BACE 0.828). PanTHERIA discrete-trait accuracies are now competitive with BACE.
Improvements: CI head-to-head evaluator
Major changes to script/gha/_bace_compat.R and script/gha/make_headtohead_report.R so pigauto’s CI numbers are directly comparable to BACE’s snapshot.
-
Pool every trait via
mi$pooled_pointto match BACE’s reporting policy. BACE uses majority-vote-across-M-imputations for categorical / ordinal accuracy androwMeans(imp_mat)for continuous RMSE. pigauto’smi$pooled_pointcarries the analogous central tendency (argmax-of-average-probability for categorical / binary, mode-or-mean for ordinal, median-or-mean for continuous depending on log-transform). Before this fix,.bace_eval_per_imputationscored every stochastic categorical sample individually — bounded byE[p_max]across cells — badly underestimating the model’s accuracy. The previous per-draw default also inflated continuous RMSE by Jensen’s inequality. A newpool_method = "auto"argument is the default; legacy"per_draw"behaviour stays available for diagnostics. -
Added
coverage_95,interval_width, andbrierto the canonical CI schema. Coverage and interval width are pigauto-only (BACE has no posterior-interval snapshot column). Brier is now computed on both sides for discrete traits so the two methods can be compared on calibration. -
h2h ordinal merge bug fixed. The previous report treated
ordinalas a continuous-family type and tried to pull RMSE from both sides, returning NA for every ordinal trait. Removed ordinal fromCONTINUOUS_TYPESso the report uses accuracy for ordinal too. AVONET migration, PanTHERIA diet_breadth / habitat_breadth now show up properly. -
Robust h2h aggregator.
load_pigauto_results()andload_bace_snapshot()now route every loaded part through.normalize_eval()so rbind across schema-version mismatches no longer crashes the aggregator. The BACE snapshot, pinned to a pre-Tier-1.5 schema, is automatically padded with NA in the new pigauto-only columns.
Improvements: CI defaults
- Default
n_imputationsinPIGAUTO_CI_CONFIGraised from 10 to 20. Mild improvement on BIEN total RMSE (8.312 → 8.288 in an autoresearch sweep of 17 settings; m=30/50/100 all regressed, so 20 is the empirical sweet spot at the current CI scale).
Reverts in dev cycle
- Tier 2 (AVONET covariates via the
cov_encoder/obs_refinepaths) was tried and reverted: CI runs showed AVONET categorical accuracy dropped sharply when covariates were passed (trophic_level 0.825 → 0.731, primary_lifestyle 0.828 → 0.657) without lifting continuous traits. The underlying covariate infrastructure stays in place — whencovariate_colsis empty the path is a no-op. Re-enabling for AVONET (or BIEN) needs a categorical-safe covariate routing first; planned for a future minor release.
Tests
- New
tests/testthat/test-bace-compat-eval.Rcovers the new pool semantics + coverage / brier computation paths. - Henderson tests extended with explicit
cor_scale = TRUEassertions on non-ultrametric trees and per-cell variance checks forhenderson_bm_predict(). - Full suite: 1511 pass / 0 fail (299 pre-existing small-n warnings unchanged).
pigauto 0.9.1.9014 (dev)
Bug fixes: prediction-time observed-cell context (2026-05-11)
-
predict.pigauto_fit()now seeds DAE refinement with stored observed latent cells and uses the phylogenetic baseline only for originally missing cells. Previously prediction initialized every cell from the baseline, so the GNN did not receive the observed trait context it was trained to use. Evaluation, cross-validation, and report metrics explicitly mask validation/test cells from that context so held-out scores remain leakage-free. - The direct
cov_linearfixed-effect path is now included in training, validation, gate calibration, conformal scoring, and prediction whenever user covariates are present. Previously that path was constructed and returned by the model, but production callers ignored it unless using the optional low-levelbaseline_muforward path. - Conformal residuals are now computed from the same calibrated three-way safety-floor blend used at prediction time, rather than reconstructing a legacy two-way BM/GNN blend from
r_cal_gnnalone.
Improvements: tree-aware downstream fitting (2026-05-11)
-
with_imputations()now carries tree metadata formulti_impute_trees()results. If the fitting callback declarestree,tree_index, orimputationarguments, they are filled with the matching posterior tree, its index, and the imputation number; the returned fits andpool_mi()output also retain the successfultree_indexvector as metadata.
Bug fixes: per-tree tree MI pooling (2026-05-11)
-
multi_impute_trees(share_gnn = FALSE)now computespooled_pointby averaging all completedT * m_per_treedatasets, matching the documented return contract. Previously this path averaged one per-tree pooled result, which could ignore within-tree draw-level variation whenm_per_tree > 1.
Bug fixes: shared-GNN tree MI pooling (2026-05-11)
-
multi_impute_trees(share_gnn = TRUE)now computespooled_pointfrom the completed imputation datasets, preservingspecies_col, user input row order, observed cells, and allm_per_treedraws. Previously this path pooled directly from raw prediction output, which could drop non-trait columns and return the point estimate in internal prediction order.
Bug fixes: shared-GNN tree baselines (2026-05-11)
-
multi_impute_trees(share_gnn = TRUE)now recomputes each posterior tree’s baseline from that tree’s own cophenetic distances and phylogenetic covariance. Previously the shared-GNN path accidentally reused the reference tree’s cached graph when fitting per-tree baselines, which could collapse the intended tree-uncertainty signal.
Bug fixes: proportion conformal intervals (2026-05-11)
-
predict.pigauto_fit()now returnsconformal_lowerandconformal_uppercolumns for explicitproportiontraits when conformal scores are available. Cross-validation/report coverage and prediction interval plots now treat proportions as numeric back-transformable traits on the original 0-1 scale.
Bug fixes: multi-proportion conformal MI (2026-05-11)
-
multi_impute(draws_method = "conformal")now generates stochastic draws formulti_proportiongroups by sampling their CLR latent columns with BM latent uncertainty and decoding back to the simplex. Previously the default conformal MI path skipped these groups because their decoded output columns are the component names rather than the group name.
Documentation: pkgdown reorganisation Phase 1 (2026-05-09)
User-facing docs reorganisation following the issues by @b1805 (#67, #68) that surfaced confusion the README and ?impute should have prevented. PR #69 fixed the function-level docstring; this release reorganises the pkgdown site itself:
- The Articles dropdown is slimmed from 39 entries to 4 source-backed vignettes. Older static walk-through HTML pages are hidden and marked archived until they are refreshed against the current README, vignettes, and benchmark evidence; the page-by-page disposition is recorded in
useful/pkgdown_page_rethink.md. - A new top-level Methodology dropdown owns benchmark HTMLs in two sub-sections: per-trait benches (7 working items) and cross-dataset benches + simulations (6 working items). Ten older bench HTMLs that are not present in
docs/dev/on the current build are documented as pending regeneration in the YAML and will re-enter the navbar once theirscript/make_*_html.Rdriver has been re-run; the Phase-2 math vignette will land in a separate PR alongside this list refresh. - New
vignettes/common-pitfalls.Rmd: five sections coveringprediction$imputedsemantics, K=3 ordinal majority-class collapse, the safety-floor closed-gate behaviour, phylogenetic signal diagnosis, andclamp_outlierstail safety. Each follows a fixed Symptom / Why / Diagnose / Fix / See-also template. -
DOCS.mdretired; README + live pkgdown site are the single source of truth for documentation navigation. README gains a one-line “Live documentation” link near the top.
No benchmark reruns or test changes.
Hardening: preprocess_traits() errors on edge-case inputs (2026-05-03)
Two silent-degradation paths surfaced by the pre-shipping coverage audit are now loud errors instead of “warn + degraded result”:
-
Zero overlap between
rownames(traits)andtree$tip.labelused to drop every trait row, emit a “tree tips have no trait data” message, and return apigauto_datawith all-NAX_scaled. Downstreamfit_pigauto()would then fail in a less-obvious place. Nowpreprocess_traits()errors immediately with"No overlap between trait row names and tree tip labels: 0 species in common.". -
Single-tip tree used to be accepted and produce a 1-row
pigauto_datawith no phylogenetic signal. Now errors with"treehas fewer than 2 tips. Phylogenetic imputation requires at least 2 species.".
No effect on any non-pathological input.
pigauto 0.9.1.9013 (dev)
Bug fixes (pre-shipping coverage audit, 2026-05-03)
Two production bugs surfaced by the pre-shipping coverage audit and fixed in PR #65:
-
suggest_next_observation(by = "species")no longer dies with"no rows to aggregate"on continuous-only traits. The species aggregator now usesstats::aggregate()’s data.frame interface (which tolerates all-NA columns) instead of the formula interface (which routes throughmodel.frame()andna.omits every row when the lhs is all NA). SeeR/active_impute.R::692. -
pool_imputations()forcountandzi_counttraits now honourspool_method = "mean"instead of silently falling back to median. A leftover-from-refactorvals <- row_median_decoded(nm)line had been overriding the pool_method-aware assignment. Thezi_countbranch additionally had a duplicate (and therefore unreachable)else if (tm$type == "zi_count")that has now been collapsed into a single branch which (a) honours pool_method on the magnitude column and (b) always probability-pools the gate column forP(non-zero). SeeR/predict_pigauto.R::809+865.
Pre-shipping coverage tests (2026-05-03)
tests/testthat/test-shipping-coverage.R (+17 test_that blocks, +68 expectations) covers:
- 9 exported functions that previously had zero direct test usage:
confusion_matrix,calibration_df,simulate_non_bm,read_tree,plot_history_gg,plot_uncertainty. - 3 non-default option paths previously untested:
multi_impute(draws_method = "mc_dropout"),fit_pigauto(use_attention = FALSE),fit_pigauto(conformal_method = "bootstrap"). - 5 edge cases: zero-overlap, no branch lengths, single-species tree, all-observed identity round-trip, all-NA column.
Documentation: clarify PMM is for multi-imputation, NOT tail safety (2026-05-01)
The bench evidence collected after PR #61 (Phase G’ PMM) and the unsuccessful Phase G’’ attempts (per-draw and post-pool conditional PMM) showed that PMM is not a tail-safety tool for pigauto. PMM in mice works because linear-regression predictions are unbiased estimators of truth across the prediction range. pigauto’s GNN can be systematically biased on rare-clade species (the AVONET Casuarius case) AND simultaneously accurate on legitimate-but-out- of-observed-range species (the AVONET seed-2032 case). PMM cannot distinguish these two cases by donor-distance alone, so it can hurt RMSE just as easily as it can help.
Where PMM genuinely helps: in multi-imputation workflows (n_imputations > 1 + pool_mi()), the donor-based between- imputation variance is well-calibrated to the trait’s marginal distribution, giving Rubin’s rules honest standard errors even when the GNN’s MC-dropout noise alone underestimates uncertainty.
This release updates the ?impute and ?predict.pigauto_fit documentation + this NEWS entry to reflect the corrected framing. No code changes.
For tail safety on single-imputation point estimates, the recommended tool remains clamp_outliers = TRUE (Phase G PR #60).
pigauto 0.9.1.9012 (dev)
Phase G’: match_observed = "pmm" – Predictive Mean Matching (2026-05-01)
Adds Predictive Mean Matching (PMM; Little 1988; Buuren & Groothuis-Oudshoorn 2011 mice) as an opt-in mechanism for multiple-imputation workflows. PMM replaces each missing cell’s back-transformed prediction with an actual observed value drawn from a donor pool ranked by predicted-value proximity.
See the v0.9.1.9013 NEWS section above for the corrected framing.
What changed
Two new arguments on impute() and predict.pigauto_fit():
-
match_observed(character, default"none"). Set to"pmm"to enable PMM. -
pmm_K(integer, default5L– mice convention). Donor pool size.
When match_observed = "pmm", for each missing cell of an at-risk trait type the function:
- Computes the predicted value via the existing BM + GNN + blend pipeline (with optional MC dropout).
- Finds the
pmm_Kobserved cells whose own predictions are closest to the missing cell’s prediction (the “donor pool”). - Samples one donor uniformly at random.
- Returns that donor’s actual observed value as the imputation.
By construction, PMM-imputed values are always in the observed data range. No extrapolation is possible.
When PMM applies
PMM is type-restricted to at-risk trait types where the back-transform amplifies tail overshoots:
- log-transformed continuous (e.g. body mass)
- count (
expm1decode) - zi_count magnitude column (same)
- proportion (mild benefit)
Discrete-class types (binary / categorical / ordinal / multi_proportion) are no-op: their decoders already return values from the observed level set by construction. Untransformed continuous: no-op (no expm1 risk).
Default policy
match_observed = "none" remains the default for backward compatibility (PR #60 added clamp_outliers opt-in; this PR adds PMM opt-in alongside it). Cross-dataset evidence in v0.9.2 may justify flipping the default to "pmm" for at-risk types.
Recommended use: * match_observed = "pmm" for log-transformed continuous traits with n_imputations > 1 (multi-imputation context, where each draw picks a different donor and Rubin’s rules give properly calibrated SEs). * clamp_outliers = TRUE, clamp_factor = 5 for users who want a deterministic point estimate without PMM stochasticity.
Implementation
-
R/pmm.R(NEW) –pmm_impute_one_trait(),apply_pmm_to_decoded(),recover_X_orig(),pmm_is_eligible(). RNG state is saved and restored around per-callset.seed()so PMM does not leak the user’s RNG state. -
R/predict_pigauto.R–predict.pigauto_fit()anddecode_from_latent()extended withmatch_observedandpmm_Karguments. PMM is applied between decode and pool, per draw. -
R/impute.R– user-facing wrapper extended with same args + roxygen. -
R/fit_pigauto.R– the fit object now retainsX_scaledsopredict()can recover original-units observed values for the donor pool. Adds ~8 * n_obs * p_latent bytes (~600 KB at AVONET full). -
tests/testthat/test-pmm.R– 5 layers of tests, 65 assertions, all pass:- L1: helper unit tests (K=1, K=5, empty cases, ties, NA preds, RNG hygiene)
- L2: type-dispatch (eligible types yes; discrete + un-log no)
- L3: statistical properties (no extrapolation, observed cells unchanged, donor membership)
- L4: integration (default = none, PMM + clamp coexist, non-eligible no-op)
- L5: edge cases (invalid K, deterministic K=1)
-
script/bench_phase_g_prime_pmm.R– AVONET 4 log-cont traits + synthetic heavy-tail bench across 3 configs (none / clamp / pmm).
pigauto 0.9.1.9011 (dev)
Phase G: opt-in clamp_outliers for back-transform tail extrapolation (2026-05-01)
Closes the AVONET Mass tail-extrapolation mode documented in useful/MEMO_2026-05-01_avonet_mass_diag.md: at N_IMP=20 on seed 2030, Casuarius bennetti was predicted at 538 kg vs truth 35 kg because all 20 MC-dropout draws produced a +3-4 SD latent overshoot that expm1() amplified to 250-1700 kg.
What changed
Two new arguments on impute() and predict.pigauto_fit():
-
clamp_outliers(logical, defaultFALSE). WhenTRUE, post-back-transform predictions for log-transformed continuous, count, and zi_count magnitude traits are capped attm$obs_max * clamp_factor(wheretm$obs_maxis the observed maximum on the original scale, recorded at preprocess time). -
clamp_factor(numeric, default5). Multiplicative factor on the observed maximum. Anything >obs_max * clamp_factoris considered implausible and capped.
For non-at-risk types (untransformed continuous, ordinal, binary, categorical, proportion, multi_proportion) the clamp is a no-op.
Default policy
clamp_outliers = FALSE remains the default for backward compatibility. Users imputing log-transformed traits with n_imputations > 1 and concerned about tail blow-ups should pass clamp_outliers = TRUE explicitly. See useful/MEMO_2026-05-01_phase_g_results.md for the AVONET acceptance bench:
| seed | clamp OFF | clamp ON | reduction |
|---|---|---|---|
| 2030 | 24,330 | 6,273 | -74 % |
| 2031 | 312 | 318 | +1.7 % (MC noise) |
| 2032 | 591 | 433 | -27 % |
Seed-2030 Casuarius prediction: 551,050 g (OFF) -> 167,847 g (ON, capped at 35,000 * 5 = 175,000).
Implementation
-
R/preprocess_traits.R: recordsentry$obs_maxandentry$obs_min(original-scale) for log-transformed continuous, count, and zi_count types. Untransformed continuous: NA (not at risk of expm1 amplification). -
R/predict_pigauto.R: extendspredict.pigauto_fit()anddecode_from_latent()withclamp_outliers/clamp_factorarguments. New.clamp_high()helper applies the cap at the three back-transform sites (continuous-log, count, zi_count magnitude). -
R/impute.R: extends the user-facing wrapper with the same arguments + roxygen.
pigauto 0.9.1.9010 (dev)
Phase H: pool_method = "mode" for ordinal traits (2026-05-01)
Closes the AVONET Migration N_IMP-dependent regression flagged in useful/MEMO_2026-05-01_phase_f_smoke_results.md.
What changed
predict.pigauto_fit() and impute() now accept a third value for pool_method:
-
"median"(default, unchanged) — per-cell median for count / proportion / zi_count magnitude; per-cellmean(integer-class) -> roundfor ordinal. -
"mean"(unchanged) — pre-v0.9.2 arithmetic-mean pooling. -
"mode"(new) — per-cell majority vote across the M draws for ordinal traits. For continuous-family traits (“mode” falls back to “median” since mode does not apply to continuous decoders). Binary, categorical, and multi_proportion are unaffected (already probability-averaged + argmax).
Why this matters
The previous ordinal pooling (mean of integer class indices, then round) biases predictions toward middle classes when MC-dropout spreads the M draws across adjacent classes. Concrete K=3 failure:
True class 1 (sedentary), 5 dropout draws produce {1,1,1,2,3}. Mean = 1.6, round = 2 -> “partial migrant” (wrong). Mode = 1 -> “sedentary” (correct).
Empirical impact on AVONET Migration (3 seeds, n=1500, miss=0.30):
| pool_method | N_IMP=1 | N_IMP=20 |
|---|---|---|
| median (default) | 0.767 +/- 0.011 | 0.713 +/- 0.032 |
| mode (new) | 0.767 +/- 0.011 | 0.779 +/- 0.011 |
Mean baseline (per-class mode): 0.800 +/- 0.013.
Switching to "mode" improves Migration accuracy at N_IMP=20 by +6.6 pp and reduces the regression vs baseline from -10.9 pp to -2.1 pp. Pre-registered acceptance criterion (Migration acc at N_IMP=20 >= N_IMP=1) passes 3 / 3 seeds.
Default policy
"median" remains the default for backward compatibility. Users running ordinal traits with n_imputations > 1 should pass pool_method = "mode" explicitly until cross-dataset evidence justifies a default flip. See useful/MEMO_2026-05-01_phase_h_results.md.
Implementation
-
R/predict_pigauto.R: extended pool_method enum; new ordinal branch usingtabulate(class_indices, nbins = K) -> which.max(). Existing median-vs-mean guards on count / proportion / zi_count branches extended to treat “mode” as “median” (fall back). -
R/impute.R: extended user-facingpool_methodenum + roxygen. -
tests/testthat/test-mi-pool-robust.R: 5 new assertions covering the mode-pool ordinal path and the median-fallback for continuous.
pigauto 0.9.1.9009 (dev)
suggest_next_observation() v2: zi_count + multi_proportion (2026-05-01)
Closes the v1.5 scope gap: suggest_next_observation() now supports all 8 pigauto trait types.
zi_count
Hybrid metric: observing a missing zi_count cell reveals the gate value (entropy reduction at the gate column, computed via the LP binary formula) AND, with probability (current LP estimate at ), reveals a magnitude (variance reduction at the magnitude column, computed via the BM Sherman-Morrison formula on the gate=1 subset). Output rows for zi_count populate BOTH delta_var_total (= probability- weighted magnitude variance reduction) AND delta_entropy_total (= gate entropy reduction). metric is set to "variance" so the row sorts on the magnitude scale; delta_entropy_total is available for users who care about gate-uncertainty separately.
multi_proportion
Per-component variance reduction summed across K CLR-z-scored latent columns. Per CLAUDE.md, multi_proportion rows are fully observed OR fully missing (no partial K-component observations), so delta_var_total is the K-component sum at each fully-missing row.
Implementation
R/active_impute.R: - New dispatch branches for zi_count and multi_proportion in suggest_next_observation(). - Reuses existing bm_variance_reduction() and lp_entropy_reduction_binary() helpers. - New needs_R_phy / needs_sim_phylo derived flags clarify which kernels each trait type needs (zi_count needs both; multi_proportion needs R_phy only).
Tests
4 new test_that blocks (tests/testthat/test-active-impute.R):
[active v2] zi_count: hybrid variance + entropy populated[active v2] zi_count: types argument can exclude zi_count[active v2] multi_proportion: per-component BM variance summed-
[active v2] multi_proportion: delta equals K-component sum(verifies via independent recomputation againstbm_variance_reduction()per component)
Total active-impute tests: 17 / 60 expects, all pass.
pigauto 0.9.1.9008 (dev)
NEW: suggest_next_observation() – active-imputation guidance (2026-04-30)
For a fitted pigauto_result (from impute()), compute the closed-form expected reduction in TOTAL predictive variance across all currently-missing cells if each candidate cell were observed next. Useful for sampling-design guidance when measurement budget is limited: which species, if observed next, would most reduce imputation uncertainty across the rest of your tree?
API
suggest_next_observation(result, top_n = 10L,
by = c("cell", "species"))Returns a pigauto_active data.frame (descending by delta_var_total). Columns:
-
by = "cell":species,trait,type,delta_var_total. -
by = "species":species,delta_var_total,n_traits_missing.
Math
Built on a Sherman-Morrison rank-1 inverse update of the BM conditional MVN: adding species s_new to the observed set updates R_oo^{-1} by a known closed form, and the variance reduction at every other missing cell is a single scalar per candidate.
Closed form per candidate cell:
delta_V_total(s_new) = sigma2 * sum_{i in miss} D[i, k]^2 / alpha_k
where D = R_mm - R_mo R_oo^{-1} R_om is the residual matrix and alpha_k = D[k, k] is the current relative leverage of cell k.
The formula assumes sigma2 (REML variance) is held fixed – the standard active-learning convention. Variance reduction depends only on the geometry of which cells are observed, not on the unknown value at s_new.
Scope (v1.5, 2026-04-30 evening)
- Continuous-family traits (
continuous,count,ordinal,proportion) – variance reduction via the BM Sherman-Morrison closed form above. - Binary + categorical traits (added 2026-04-30 evening) – expected entropy reduction via the label-propagation closed form: current LP probability has entropy ; observing updates LP linearly via the similarity kernel; expected entropy after observation is averaged over = current LP estimate at .
-
zi_countandmulti_proportionnot yet supported (silently skipped). - Single-obs only. Multi-obs deferred to v2.
Variance and entropy are not directly comparable. The default output mixes them by delta value, which is approximate across metrics. When precise ranking matters, filter by metric (“variance” or “entropy”) first.
Why this is novel
To my knowledge, no other phylogenetic-imputation package (Rphylopars, BACE, phylolm, mice with phylogenetic correlation) exposes a sampling-design helper. pigauto is uniquely positioned because its BM conditional MVN already computes per-cell predictive variance – this function just exposes the closed-form variance-reduction formula derived from rank-1 update. The methods are 30 years old in optimal-design literature (Cohn et al. 1996, JAIR 4:129–145); applying them to phylogenetic trait imputation appears to be new.
Files
-
R/active_impute.R(new) – 304 LOC, hostsbm_variance_reduction()(internal) andsuggest_next_observation()(exported), plusprint.pigauto_activeS3 method. -
R/impute.R– addstreeto thepigauto_resultsosuggest_next_observation()can recomputeR_phyif it was freed in the post-fit cleanup block. -
tests/testthat/test-active-impute.R(new) – 10 tests, 29 expects. Verifies closed-form against an independent fixed- sigma2 brute-force refit on a 20-tip fixture; descending order; by-species aggregation; multi-obs error path; discrete-skip.
NEW: gate_method = "cv_folds" – cross-validated gate calibration (2026-04-30)
Closes the longest-standing open NEWS item: “A future improvement (cross-validated gate selection, …) could close this further; not in scope for this release.”
What changed
calibrate_gates() and fit_pigauto() gain gate_method = "cv_folds" (opt-in alongside the existing "single_split" default and "median_splits" option). When selected, val cells are partitioned into gate_cv_folds = 5L (default) deterministic non-overlapping folds; the existing grid + half-B-verify procedure runs once per fold (half_a = K-1 training folds, half_b = held-out fold), and the componentwise median of K winning weight vectors becomes w_final. Post-cal full-val invariant fires unchanged.
Compared to "median_splits"
| Property | median_splits | cv_folds |
|---|---|---|
| half_a per split | 1/2 of n_val | (K-1)/K of n_val |
| Number of splits | B=31 random | K=5 deterministic |
| Overlap | Random (each cell in many splits) | None |
| Statistical interpretation | Median of B noisy gate selections | Standard k-fold CV |
Larger training sets per split → less noisy w_k. Standard CV gives a cleaner story for users and reviewers.
Motivation
The strict val-floor enforces pigauto_val <= baseline_val, but val→test drift survives in 4/32 binary cells (per useful/MEMO_2026-04-29_discrete_bench_reruns.md). Per-trait gate calibration overfits to the specific val cells because the same cells score every candidate weight. Larger training sets per split reduce overfitting; non-overlapping folds give a true held- out evaluation.
Edge cases
-
gate_cv_folds < 2errors with a clear message. -
n_val < gate_cv_foldscapsK_eff = n_valso each fold has at least 1 cell. -
n_val < 2(CV ill-defined) falls back to a single split. - Existing back-compat preserved: default
gate_method = "single_split", and the"median_splits"path is bit-identical to pre-fix.
Tests
5 new tests in tests/testthat/test-safety-floor.R (under “# CV-fold gate calibration”):
- Valid simplex output
- Deterministic for fixed seed
- Pure-MEAN snap when MEAN dominates (post-cal invariant fires)
-
n_val < Kgraceful fallback -
gate_cv_folds < 2error
All pass. Existing 24 safety-floor tests + B4 edge cases unchanged.
Smoke (AVONET 300, 60 epochs, 1 imputation, 60 Mass + 60 Migration cells masked)
| gate_method | wall (s) | r_cal_gnn[col_3] |
|---|---|---|
| single_split | 82.6 | 0.000 |
| median_splits | 77.2 | 0.050 |
| cv_folds | 75.9 | 0.150 |
cv_folds is slightly faster (smaller per-split work) and selects a non-zero GNN gate on the col where the others snap to BM.
Bench (focused 3-way comparison on synthetic mixed-type)
script/bench_cv_vs_median.R runs a focused 3-way comparison on a synthetic continuous + binary + categorical fixture (BM, lambda=1, n=300, 30 % mask, 5 reps). Test-set means across reps:
| Method | bin (acc) | cat3 (acc) | cont (RMSE) |
|---|---|---|---|
| single_split | 0.8222 | 0.7356 | 1.0778 |
| median_splits | 0.8222 | 0.7356 | 1.0720 |
| cv_folds | 0.8222 | 0.7356 | 1.0611 |
- Continuous: cv_folds reduces RMSE by 1.6 % vs single_split and by 1.0 % vs median_splits. Modest but consistent across the 5 reps (per-rep cv_folds < median_splits in 4/5 reps).
-
Discrete: identical across all three methods on this fixture. At lambda=1 the strict val-floor snaps the gate to the pure-BM corner regardless of how the gate was selected, so the three methods produce identical class assignments. This is consistent with what the discrete-bench memo (
useful/MEMO_2026-04-29_discrete_bench_reruns.md) showed: the val→test drift on 4/32 binary cells is val→test extrapolation noise, not gate-selection noise; cv_folds does not close it on this regime. The hypothesis that larger training sets per split would close MORE of the val→test drift than median_splits is not supported by this fixture. cv_folds is preferable for continuous traits but not transformative for discrete.
Full per-type bench reruns (bench_binary.R, bench_categorical.R, bench_zi_count.R at multiple signal levels) deferred to a later session.
Strict val-floor v3 — type-conditional safety floor (2026-04-29)
A type-restriction bug in R/fit_helpers.R::calibrate_gates was silently exempting binary, categorical, and zi_count traits from the post-calibration “strict val-floor” invariant. The buggy behaviour let pigauto underperform its own baseline on the package’s own simulator benches (bench_binary.md, bench_categorical.md, bench_zi_count.md):
- binary
signal_0.6mean accuracy: −4.8 pp below baseline - categorical
K_3cat2: −12 pp below baseline - zi_count
zf_0.2zi1: +23 % RMSE above baseline
These violate the safety-floor’s “pigauto cannot underperform its own baseline” promise. The bug was introduced 2026-04-23 with the initial safety-floor commit (28d8e45); the type filter mirrored an earlier continuous-only MSE evaluator and was not extended when the simplex grid landed.
The fix splits the post-calibration invariant into two type-specific blocks:
- Discrete family (binary, categorical, zi_count) gets a strict full-val-set check against both pure-BM and pure-MEAN corners; if the calibrated blend’s val-loss exceeds either corner by 1e-12, the gate is overridden to whichever pure corner is better. Discrete losses are 0-1 / argmax step functions, so this strict check is well-conditioned at typical val sizes.
- Continuous family (continuous, count, ordinal, proportion) keeps the pre-2026-04-29 mean-only check verbatim (override only if the blend loses to pure-MEAN, never to pure-BM). An intermediate fix that also added a strict pure-BM check for continuous types over-corrected on high-phylo-signal traits where the blend’s val MSE is numerically equal to pure-BM’s val MSE within sampling noise (AVONET Mass test RMSE went 386 → 659 in the over-strict variant); v3 reverts continuous to the original conservative check.
- multi_proportion is skipped (gate naturally closes via the legacy path; bench shows pigauto = baseline at every signal level).
Empirical evidence (CORRECTED 2026-04-30 per the Opus adversarial review; the original tabulation here mistakenly averaged val + test splits in the bench RDS, overstating the closure). The user-facing held-out test-set mean acc gap pigauto − baseline (lower-magnitude = closer to baseline-parity floor):
| bench | Apr 17 (pre-bug) | Apr 28 (buggy) | Apr 29 (post-fix v3) |
|---|---|---|---|
| binary | −0.029 | −0.027 | −0.014 |
| categorical | −0.031 | −0.026 | −0.016 |
| zi_count (RMSE ratio) | 1.058 | 1.043 | 1.012 |
The strict val-floor closes ~50 % of the test-side regression on binary and categorical (from −2.7 / −2.6 pp to −1.4 / −1.6 pp); the remaining ~1.4 pp is val→test sampling drift. zi_count test ratio drops from 1.06 to 1.01 – most of the regression closes, with a small residual. 3 of 32 binary cells still drift ≥ 5 pp on test (val→test extrapolation that no val-only floor can prevent).
The strict val-floor’s invariant is enforced at calibration time on the calibration-time loss surface (compute_corner_loss(g, val_row_idx, ...)). At predict time the model runs additional refine_steps and mask_token substitution that introduce a small (~5 %) drift; the tests/testthat/test-safety-floor.R strict-floor invariant test documents this with a * 1.05 slack. So the user-facing guarantee is “pigauto val-loss ≤ baseline val-loss + 1e-12 at calibration time, modulo ~5 % refine_step drift at predict time” – not the bit-exact “≤ 1e-12 everywhere” reading.
Continuous benchmarks (AVONET n=1500 single seed × N_IMP=5): the originally-reported “v3 within MC-dropout noise of pre-fix” landed on a particularly close pair of draws. Re-running the same seed × n × N_IMP three times produced Mass test RMSE = 377, 430, 504 – the comparison is non-deterministic at this evidence level (pooling order over the 5 MC dropout draws varies between R sessions). A multi-seed × N_IMP=20 verification is queued; the single-seed single-run evidence does NOT support strong claims about whether v3 changed continuous performance, only that the change (if any) is within run-to-run pooling noise on this dataset.
Per-cell variance survives in multiple binary cells (val→test sampling drift). The strict val-floor only guarantees pigauto_val_loss ≤ baseline_val_loss + 1e-12 (calibration time); it does not guarantee pigauto_test_loss ≤ baseline_test_loss. A future improvement (cross-validated gate selection, epsilon-margin tightening, or LP added as a corner option for ordinal) could close this further; not in scope for this release.
R CMD check after the fix: 0 errors / 0 warnings / 0 notes. Full devtools::test(): FAIL 0 / SKIP 2 / PASS 1081.
New regression test in tests/testthat/test-safety-floor.R: "strict val-floor: pigauto val-loss <= baseline val-loss per trait, all types" — fits on a mixed continuous + binary + categorical + count fixture, recomputes val-loss for the calibrated blend and the pure baseline, asserts blend ≤ baseline × 1.05 for every trait.
Calibration evidence preserved at script/_strict_floor_v3_calibration/ (4-way comparison: pre-fix / v1 over-strict / v2 intermediate / v3 current). Memos: useful/MEMO_2026-04-29_*.md.
Per-trait ordinal baseline path selection (Opus #6, 2026-04-30)
fit_baseline() now computes BOTH the threshold-joint baseline (B3, commit a541dbd) AND a per-column BM-via-MVN alternative for each ordinal trait, then picks the lower-val-MSE path per trait against splits$val_idx. The chosen path is exposed as fit_baseline()$ordinal_path_chosen (named character: latent col → "threshold_joint" or "bm_mvn").
Motivation: the AVONET-300 Phase 6 bench’s Migration RMSE regressed from 0.879 (Apr 16) to 0.975 (Apr 29) when B3 swapped the ordinal baseline path from a label-propagation-equivalent BM-via-MVN to the truncated-Gaussian threshold-joint approach. At K=3 ordinals the K−1 thresholds are pinned to a narrow band by phylopars EM, producing systematically worse predictions than BM-via-MVN on z-scored integer class. See useful/MEMO_2026-04-29_phase6_migration_bisect.md.
Per Opus’s adversarial review #6, we did not ship a K ≤ 3 → LP heuristic. Instead the per-trait selection runs at every K and the gate’s safety-floor simplex no longer has to route around threshold-joint output for low-K ordinals – the lower-val-MSE baseline is selected before the gate calibration sees it.
Scope: single-obs only (multi-obs ordinal selection is out of scope for this fix; it would require species-level aggregation of the BM alternative). The code path is gated by !multi_obs && !is.null(splits) and only activates when threshold-joint actually populated the ordinal column.
Two regression tests in tests/testthat/test-joint-threshold-baseline.R:
-
"fit_baseline picks per-trait ordinal path and reports it"– smoke test on K=3 ordinal + continuous mixed data; verifies finite output and that$ordinal_path_chosenis one of the two valid values. -
"fit_baseline ordinal selection picks BM when threshold-joint loses on val"– recomputes both paths’ val MSE independently and asserts the selection picked the lower one.
Adversarial-review fixes (Opus 2026-04-28)
Four correctness fixes surfaced by the Opus adversarial pass on top of PR #49 (the safety-floor / mean-gate work). All four are silent-failure modes that produced plausible-looking but invalid output; none of them threw a useful error before this release.
HIGH —
multi_impute(draws_method = "conformal")no longer collapses to zero between-draw variance on shuffled input.R/multi_impute.R::.sample_conformal_drawnow threadsinput_row_orderso it correctly maps user-input mask indices to internal (tree-tip) row indices when perturbingpred$imputed. Previously, on any data frame whose rows are not already in tree-tip order, the sampler perturbed the wrong cells andbuild_completedre-aligned the unperturbed values back into the user’s missing cells, producing M identical datasets andvar = 0after pooling. Any prior Rubin’s-rules pooling on shuffled-input multi-imputation was unreliable. Caught by a new shuffled-input variance regression test; pre-fixper_cell_sd ≡ 0, post-fix> 1e-8at every masked cell.MEDIUM (opt-in) —
conformal_split_valargument onfit_pigauto()lets users avoid the validation-set double-dipping that breaks split-conformal exchangeability. WhenTRUE, the validation cells are split per-column into a calibration half (used bycalibrate_gates()to pick the blend weights) and a conformal half (used to estimate the residual quantile). Reusing the same val cells for both selects the gate to minimise residual MSE on the very cells whose residuals drive the conformal quantile, producing systematic undercoverage that’s most visible at smalln_val(matches thecoverage_investigationmemo). Default isFALSE(pre-fix single-set behaviour) because forcing the split regresses the AVONET300 / OVR-categorical / BIEN safety-floor smoke benches by 2-26% on small-val datasets. Useconformal_split_val = TRUEwhen accurate 95% coverage matters more than bench-grade RMSE.MEDIUM —
calibrate_gateshalf-A loop is now NA-safe. Mirrors the half-B fix from commit 573decd: whenloss_a_pure_bmis non-finite (e.g. multi_proportion CLR latents whose half-A subset is too small for some component),best_lais now seeded withInfand the legacy pure-BM-relative-gain filter is skipped rather than throwingmissing value where TRUE/FALSE needed.LOW —
predict.pigauto_fitno longer produces a cryptic torch shape error when calibrated_gates is overridden with a vector of the wrong length. Length 0 is now treated as “no calibration” (degrades gracefully to the learned-gate path); any other non-matching length errors with an explicit"calibrated_gates has length k but the model has p latent column(s)"message. Unblocks the AE-attribution ablation workflow used byscript/phase1_gnn_ablation.R.
Each fix has a dedicated regression test: tests/testthat/test-multi-impute.R::"multi_impute(draws_method=\"conformal\") produces non-zero between-draw variance on shuffled input", tests/testthat/test-fit-predict.R::"conformal_split_val toggle changes conformal scores (no double-dipping)", and tests/testthat/test-fit-predict.R::"predict.pigauto_fit catches wrong-length calibrated_gates override".
pigauto 0.9.1.9006 (dev)
Per-occurrence WorldClim covariates (v1.1 follow-up to B.2)
Fixes the core limitation of PR #47 (centroid-only bioclim extraction): bioclim is now extracted at every GBIF occurrence point per species (not just the centroid), so the IQR columns carry real range-breadth signal instead of being constant zero. This is what actually lets the safety-floor calibrator see a meaningful covariate input and open the GNN gate on plants.
Validated by the honest-sim diagnostic (experiment/covariate-honest-sim) which showed pigauto’s architecture can produce 32 % RMSE lift on strong-environment traits when covariates carry real signal — something centroid-only bioclim failed to deliver on plants.
API
-
pull_gbif_centroids(..., store_points = FALSE)— new arg, defaultFALSEpreserves pre-v0.9.1.9006 cache format. WhenTRUE, persists the raw filtered lat/lon points in each species’ cache RDS, enabling per-occurrence bioclim extraction. -
pull_worldclim_per_species()— no signature change. Internally,.wc_extract_one()now prefers cached per-occurrence points, falling back to centroid only whenpointsfield is absent (legacy caches).
Back-compat
Pre-v1.1 cache RDS files remain fully readable. Users who want per-occurrence extraction upgrade their cache with one call:
pull_gbif_centroids(sp_list,
cache_dir = "script/data-cache/gbif",
store_points = TRUE,
refresh_cache = TRUE) # one-time refreshAfter that, subsequent calls are instantaneous (cache hit).
Fixture regeneration
tests/testthat/fixtures/worldclim_plants_300.rds re-generated with per-occurrence aggregation. IQR columns are now informative.
Follow-ups
- Covariate-aware phylo-signal gate — nice-to-have for interpretability (gate de-triggers when covariates resolve weak- phylo-signal traits). NOT a blocker given the safety-floor calibrator opens the gate on its own.
- Full-scale plants bench (n=4,745) — operator-run via
script/bench_bien_worldclim.R. Expected: SLA r >= 0.35, leaf_area r >= 0.30.
pigauto 0.9.1.9005 (dev)
WorldClim bioclim covariates (B.2)
New exported helper pull_worldclim_per_species(species, gbif_cache_dir, worldclim_cache_dir, ...) extracts 19 WorldClim v2.1 bioclim variables at each species’ GBIF occurrence points, aggregates per species (median + IQR), and returns a 38-column numeric data.frame ready for impute(..., covariates = ...).
What it does
- Resolves each species’s GBIF occurrence cache from
pull_gbif_centroids()(B.1). - Downloads WorldClim v2.1 10-arc-minute raster stack (~500 MB) to
worldclim_cache_diron first call; sentinel file makes subsequent calls offline. - For each species, extracts bioclim values at its centroid (v1; per-occurrence extraction deferred to v1.1) via
terra::extract(). - Aggregates to median + IQR per variable (38 columns total).
- Caches the per-species extract as one RDS in
worldclim_cache_dir/extracts/.
Important interaction with the phylogenetic-signal gate
The phylo_signal_gate introduced in v0.9.1.9003 fires on raw trait values, so traits with weak phylogenetic signal (e.g. plants SLA, leaf_area, height_m) get routed directly to the grand-mean corner BEFORE the GNN sees any covariates. To see bioclim lift those traits, users must set phylo_signal_gate = FALSE:
cov_bio <- pull_worldclim_per_species(
rownames(traits),
gbif_cache_dir = "script/data-cache/gbif",
worldclim_cache_dir = "script/data-cache/worldclim")
res <- impute(traits, tree, covariates = cov_bio,
phylo_signal_gate = FALSE) # <-- requiredA follow-up spec (“covariate-aware phylo-signal gating”) will compute lambda on residuals-after-covariates so the gate composes cleanly with covariates without the manual opt-out.
Dependency
terra added to Suggests. pull_worldclim_per_species() errors with an install hint when terra is absent.
Testing
-
tests/testthat/test-worldclim-covariates.R: 20 offline expectations (cache key, aggregation, download sentinel, extract cache, NA handling). -
tests/testthat/fixtures/worldclim_plants_300.rds: pre-extracted fixture for 300 BIEN species. - End-to-end smoke (NOT_CRAN-gated): bioclim lifts SLA or leaf_area RMSE by >= 10% on a BIEN 200-subset when combined with
phylo_signal_gate = FALSE.
Follow-ups
- True per-occurrence extraction (v1.1): extend B.1’s GBIF cache to store raw lat/lon point lists, then aggregate bioclim at each point rather than at the centroid.
- SoilGrids (B.3): add 5-10 soil variables similarly.
- Covariate-aware phylo-signal gating: recompute lambda on residuals-after-covariates so the gate de-triggers automatically when covariates resolve weak-phylo-signal traits.
pigauto 0.9.1.9004 (dev)
GBIF range-centroid covariates
New exported helper pull_gbif_centroids(species, cache_dir, ...) fetches species occurrence records from the Global Biodiversity Information Facility (GBIF) and returns a data.frame of per-species median lat/lon centroids ready for impute(..., covariates = ...).
What it does
For each species, pull_gbif_centroids():
- Resolves the taxon via rgbif::name_backbone() (handles synonyms).
- Fetches up to occurrence_limit records via rgbif::occ_search( hasCoordinate = TRUE), paginated at 300/call.
- Drops records with hasGeospatialIssues = TRUE, basisOfRecord in c(“FOSSIL_SPECIMEN”, “LIVING_SPECIMEN”), or out-of-range coordinates.
- Computes median lat / lon per species (centroids more robust than means to outliers).
- Caches one RDS per species at cache_dir (strongly recommended to avoid GBIF rate-limit issues).
Why it matters
Plant traits like SLA and leaf area are environment-driven. Even the simplest geographic covariate – a species range centroid from GBIF – carries biogeographic signal that phylogeny alone doesn’t. This is the B.1 proof-of-pipe for the pigauto covariate path; the full B.2 spec (WorldClim bioclim + SoilGrids raster extraction) follows.
End-to-end smoke (BIEN 200-subset)
Adding GBIF centroids as covariates keeps SLA and leaf_area RMSE within +10% of the no-cov baseline (ratios ~1.09 and ~1.01 at n=200). At this sample size the val-to-test sampling noise dominates any covariate lift signal; production-grade positive lift reserved for the B.2 bioclim run at larger n.
API
- pull_gbif_centroids(species, cache_dir = NULL, occurrence_limit = 500L, sleep_ms = 100L, verbose = TRUE, refresh_cache = FALSE)
- Returns data.frame with species / centroid_lat / centroid_lon / n_occurrences, rownames = species.
Dependency
rgbif added to Suggests (not Imports). pull_gbif_centroids() fails with a helpful install hint when rgbif is absent.
Testing
- tests/testthat/test-gbif-centroids.R: 32 offline expectations (cache key, aggregation, filtering, cache hit/miss, public function integration via mocks).
- End-to-end fixture at tests/testthat/fixtures/gbif_plants_300.rds (259/300 species with valid centroids) + live smoke test gated by NOT_CRAN=TRUE environment variable.
Phylogenetic-signal gate
Default change since v0.9.1.9003: impute() / fit_pigauto() now compute per-trait Pagel’s lambda on training-observed cells and route traits with lambda < 0.2 to the grand-mean corner (0, 0, 1) of the safety-floor simplex, skipping BM fitting and GNN training effort on those traits. Complements the v0.9.1.9002 safety floor with an explicit per-trait decision: “pigauto knows when phylogeny helps”.
Effect on existing benches
- Birds / mammals / fish / amphibians: zero traits gated (all lambda >= 0.2). Results bit-identical to v0.9.1.9002.
- Plants (BIEN x V.PhyloMaker2 n=500 smoke): 3 of 5 traits flagged as weak-signal (leaf_area, sla, height_m) and gated to grand mean. fit$phylo_gate_triggered reports which.
API
- impute(…, phylo_signal_gate = TRUE, phylo_signal_threshold = 0.2, phylo_signal_method = “lambda”) - new args.
- fit_pigauto() same pass-through.
- Fit object gains four new slots: phylo_signal_per_trait (named numeric lambda values), phylo_gate_triggered (named logical), phylo_signal_method, phylo_signal_threshold.
Opt-out
phylo_signal_gate = FALSE reproduces v0.9.1.9002 safety-floor-only behaviour bit-identically.
pigauto 0.9.1.9002 (dev)
Safety floor: pigauto is never worse than the grand mean
Default change: impute() / fit_pigauto() / multi_impute* now run with safety_floor = TRUE. The calibration grid post-training searches a three-way convex combination of (BM baseline, GNN delta, grand mean) on a 231-point simplex at step 0.05. Because the pure-mean corner (0, 0, 1) is always a candidate in the grid, validation RMSE is guaranteed by construction to satisfy pigauto_val_RMSE <= mean_val_RMSE on every trait. On held-out test, the invariant holds up to sampling-noise slack (tolerance +2% in the production canary, verified on five taxa by script/regress.R).
Effect on existing benches
- Birds, mammals, fish, amphibians: lift unchanged (within 1%).
- Plants: on weak-signal traits (
height_m,leaf_area,seed_mass,sla) the calibrator opensr_MEAN > 0and the prediction falls toward the grand mean, fixing the boundary-case regression where pigauto was previously 15-101% worse than mean;wood_densitylift preserved.
API
-
impute(..., safety_floor = TRUE)— new arg, defaults to TRUE. -
fit_pigauto(..., safety_floor = TRUE)— same. - The fit object gains four new slots:
r_cal_bm,r_cal_gnn,r_cal_mean(each a named numeric of lengthp_latent), andmean_baseline_per_col. -
multi_impute()/multi_impute_trees()propagate the mean term automatically viapredict.pigauto_fit(); no signature change. - Legacy v0.9.1 fit objects keep working via
%||%fallback inpredict.pigauto_fit(): when the new slots are absent, the 2-way blend(1 - r_cal) * BM + r_cal * GNNis reconstructed from the scalarr_cal.
Opt-out
safety_floor = FALSE reproduces the v0.9.1 1-D calibration bit-identically on the legacy path.
Testing
-
tests/testthat/test-safety-floor.R: smoke canary, ~3 min on MPS. 23test_thatblocks, 75 expectations covering: simplex grid, mean-baseline scalar, calibrate_gates 3-way path, fit_pigauto integration, predict 3-way blend + legacy fallback, multi_impute propagation, multi_impute_trees share_gnn propagation, AVONET300 regression (continuous RMSE within +2%, discrete accuracy within -2pp), plants safety smoke (cached BIEN n=1000, RMSE within +15% of grand mean). -
script/regress.R: full canary, ~20 min on MPS at n=1000 per taxon. Writesscript/regress_result.jsonwith per-bench pass/fail.
Files
-
R/calibration_three_way.R: new —simplex_grid(),mean_baseline_scalar()helpers. -
R/fit_helpers.R:calibrate_gates()now returns list of 3 weight vectors and optionally searches the simplex grid. -
R/fit_pigauto.R: computesmean_baseline_per_colfrom training cells; passessafety_floorthrough; stores the new slots. -
R/predict_pigauto.R: 3-way blend with%||%backward compat. -
R/impute.R: pass-through arg + roxygen. -
R/multi_impute.R,R/multi_impute_trees.R: roxygen only. -
inst/extdata/legacy_fit_v091.rds: fixture for backward-compat test. -
data-raw/make_legacy_fit_v091.R: one-shot fixture builder. -
specs/2026-04-23-safety-floor-mean-gate-design.md: design spec. -
plans/2026-04-23-safety-floor-mean-gate.md: implementation plan.
pigauto 0.9.1.9000 (dev)
Median pooling for MI on count / proportion / zi_count traits (fixes #40)
- New argument
pool_method = c("median", "mean")onimpute()andpredict.pigauto_fit(). Default"median"takes the per-cell median of the M decoded draws whenn_imputations > 1for count, proportion, and zi_count magnitude traits — robust to dropout-noisy latents that get amplified byexpm1()/plogis()decoders."mean"restores pre-v0.9.2 arithmetic-mean pooling. -
Why this matters: the PanTHERIA full-scale bench at n = 4,027 flagged a pathological RMSE blow-up on
litter_sizeatn_imputations = 20(default RMSE 7.7, em5 RMSE 60.0) — a single dropout-noisy latent draw at ~+5 SD decoded to thousands and contaminated the mean. With median pooling the RMSE returns ton_imputations = 1scale. Coverage was unaffected (conformal ≥ 0.95, MC-dropout ≥ 0.92); this is purely a point-estimate fix. -
Continuous / ordinal / binary / categorical traits are unaffected — linear or probability-averaged decoders don’t amplify latent outliers the same way.
pool_methodhas no effect on them. - New test file
tests/testthat/test-count-mi-pooling.Rguards the robustness: 5-draw synthetic with one injected outlier of 1000 on a count trait returns median = 2 (expected) but mean ≈ 202 (bug). ## PanTHERIA mammal benchmark (second real-data validation)
Second real-data benchmark for pigauto after AVONET (birds). Tests the “general-purpose phylogenetic trait imputation” claim on a different taxon with a different trait mix.
-
script/fetch_pantheria_and_tree.R— one-shot downloader. Fetches PanTHERIA (Jones et al. 2009, Ecology) from the ESA archive and builds a taxonomy-derived cladogram from the Order/Family/Genus/ Species columns viaape::as.phylo.formula()+ Grafen branch lengths. Not a time-calibrated phylogeny, but encodes the hierarchical taxonomic structure pigauto’s phylo prior uses. Self-contained; no external tree-database dependency. -
script/bench_pantheria_full.R— full-scale bench on ~4,629 mammals × 8 canonical traits (body mass, head-body length, gestation length, litter size, max longevity, diet breadth, habitat breadth, terrestriality), MCAR 30% per trait.pigauto_default+pigauto_em5vsmean_baseline. -
script/bench_pantheria_bace_head_to_head.R— stratified 500- species subset with optionalBACE::bace()for cross-package parity (gracefulrequireNamespaceskip when BACE isn’t available). -
script/make_pantheria_summary_html.R— aggregate landing page linked from the validation suite alongside the Phase 8 summary. - Two new rows in the pkgdown validation suite table for the full bench + BACE head-to-head. ## Phase 8: discriminative benchmark suite (complete)
Five reproducible bench scripts + one aggregate report, all linked from the pkgdown validation suite. No R/ code changes; bench infrastructure only.
-
script/bench_signal_sweep.R— Pagel’s λ ∈ {0.1, 0.3, 0.5, 0.7, 0.9, 1.0} × mixed-type traits (2 cont + 1 bin + 1 cat K=3) × 4 methods × 3 reps. Headline: at λ = 1.0 pigauto reaches 94.4% binary accuracy vsmean_baseline76.3%; at λ = 0.1 all methods collapse near chance (expected — no signal). -
script/bench_bace_avonet_head_to_head.R—pigauto_defaultvspigauto_em5vs optionalBACE::bace()onavonet300/tree300with identical splits. GracefulrequireNamespaceguard when BACE is unavailable. Honest finding: Phase 6 EM does NOT universally beat the plug-in at n = 300 (default 80.5% vs em5 65.9% on Trophic.Level), consistent with the “Calibration at small n” caveat from v0.9.1 — EM is opt-in for a reason. -
script/bench_correlation_sweep.R(Phase 8.1) — cross-trait correlation ρ ∈ {0, 0.2, 0.4, 0.6, 0.8} × 4 continuous traits × 3 reps. pigauto delivers consistent 6–10× RMSE reduction overmean_baselineacross all ρ (0.07–0.14 vs 0.64–0.80). Pearson r averages 0.97 at all ρ. -
script/bench_evo_model_sweep.R(Phase 8.2) — 4 evolutionary models × 3 reps. BM RMSE 0.13 (6× lift over mean); OU 0.32 (3×); regime_shift 0.07 (14×); nonlinear 0.63 (1.6× — pigauto struggles on genuinely non-BM correlations, Pearson r drops to 0.69). Honest graceful-degradation story. -
script/bench_clade_missingness.R(Phase 8.3) — realistic MAR pattern via random-clade masking on focal trait. At every target_frac tested pigauto maintains Pearson r ≥ 0.96 and RMSE 5–8× lower thanmean_baseline: 0.10→0.18 vs 0.99 (r 0.97); 0.25→0.14 vs 1.11 (r 0.97); 0.40→0.16 vs 1.16 (r 0.96). Strong validation of pigauto’s phylo prior in the realistic missingness regime (real databases undersample clades, not random species). Clade-picker caps attarget_fracto avoid degenerate 100%-masked runs. -
script/phase8_summary.html— aggregate TL;DR + sub-report links- reproducibility block (package versions,
sessionInfo()).
- reproducibility block (package versions,
- Validation suite (
pkgdown/assets/validation_suite.html) gains five new rows, one per bench, with gracefulpendingfallback on missing RDS files. ## Taxonomic-breadth benchmarks: 4 vertebrate classes + plant boundary case
Real-data benches now cover four vertebrate classes plus a plants run that establishes the scope of phylogenetic imputation. All use 30% MCAR held-out, seed 2026, and report both conformal and MC-dropout 95% coverage alongside RMSE / accuracy. The vertebrate benches consistently lift point estimates by 27-75% RMSE / 10-40 pp accuracy; the plants bench is the first honest boundary-case result, with a clean wood_density lift but r ≤ 0.21 on four other traits, attributable to weak phylogenetic signal in pooled BIEN species means and random polytomy resolution in V.PhyloMaker2 (details below). Conformal coverage still lands at 0.90-0.98 on plants, so the uncertainty story generalises across the kingdom jump even where point estimates do not.
-
Birds (
script/bench_avonet9993_bace_n3000.rds, Vulcan L40S): AVONET n=3,000 subset x 7 traits. Mass RMSE -45%, Beak -55%, Tarsus -46%, Wing -57%; Trophic.Level accuracy +16 pp; Primary.Lifestyle +13 pp. Conformal 0.94-0.96. New driverscript/bench_avonet_full_local.Rruns the full n=9,993 locally on Mac MPS (Vulcan hit a predict-stage OOM at this scale); overnight job, HTML generatorscript/make_bench_avonet_full_local_html.Rready. -
Mammals (from
feature/bench-pantheria): PanTHERIA n=4,027 x 8 traits. terrestriality accuracy 0.55 -> 0.95 (+40 pp); body_mass_g RMSE -27%, head_body_length -75%; diet/habitat breadth +10 pp each. Conformal 0.93-0.96. -
Fish (
script/bench_fishbase.rds, Mac MPS): FishBase x fishtree n=10,654 x 6 traits. Vulnerability RMSE -45%, Length -38% (r=0.81), BodyShapeI accuracy +31 pp; Troph -23%, Depth -10%; Weight improved from r=0.04 (noise) to r=0.42 under the new median-pool MI fix. -
Amphibians (
script/bench_amphibio.rds, Mac MPS): AmphiBIO n=5,237 x 2 continuous traits. Body_size_mm RMSE 89 -> 44 (-51%), Body_mass_g -27% (r=0.98). Continuous-only v1 – AmphiBIO binary / categorical columns hit a character/double type mismatch in the threshold-joint baseline, parked for v0.9.2. Calibrated gates = 0 on both traits: pigauto’s GNN correctly recognises that the Level-C joint MVN baseline is already optimal, and still beats the mean baseline by 27-51% because the BM baseline captures cross-trait correlation + phylogenetic signal. -
Plants (
script/bench_bien.R, honest boundary-case finding): BIEN x V.PhyloMaker2 at n=4,745 species, n_imputations=20. Onlywood_densitylifts (-6% RMSE, r=0.43);height_m,sla,seed_mass,leaf_areashow r ≤ 0.21 and are 15-101% worse than grand mean on RMSE. Root cause is in the data, not pigauto:- BIEN species means are pooled from heterogeneous individual observations, diluting phylogenetic signal, and (ii) V.PhyloMaker2 scenario S3 resolves within-genus polytomies randomly, so most species sit behind random branches relative to the Smith & Brown 2018 backbone. pigauto’s gate safety contains the damage: on weak-signal traits the calibrated gate closes to 0 and the pipeline degrades gracefully to the (also weak) BM prior rather than blowing up. Conformal coverage still lands at 0.90-0.98 on all 5 traits – the distribution-free uncertainty story holds even when point estimates are weak, separating the UQ narrative (always robust) from the point-estimate narrative (taxon-dependent). A first pass at
n_imputations = 1showedheight_mRMSE blowing up to 47.7 via Jensen exp()-decode bias; rerunning atn_imputations = 20activated thedc8cffamedian-pool MI correction and dropped it to 15.15. The weak- signal result is what remains after that fix – it is a property of BIEN x V.PhyloMaker2 at this species pool, not a pigauto regression. Seescript/bench_bien.htmlfor the full honest reading and the in-page Jensen note.
- BIEN species means are pooled from heterogeneous individual observations, diluting phylogenetic signal, and (ii) V.PhyloMaker2 scenario S3 resolves within-genus polytomies randomly, so most species sit behind random branches relative to the Smith & Brown 2018 backbone. pigauto’s gate safety contains the damage: on weak-signal traits the calibrated gate closes to 0 and the pipeline degrades gracefully to the (also weak) BM prior rather than blowing up. Conformal coverage still lands at 0.90-0.98 on all 5 traits – the distribution-free uncertainty story holds even when point estimates are weak, separating the UQ narrative (always robust) from the point-estimate narrative (taxon-dependent). A first pass at
Validation suite (pkgdown/assets/validation_suite.html) and the paper-draft summary (useful/results_summary.md) collect all numbers in one place.
FishBase + fishtree benchmark (vertebrate breadth triad complete)
- New
script/bench_fishbase.Rbenchmarks pigauto on the intersection offishtree::fishtree_phylogeny()(Rabosky et al. 2018) and rfishbase’sload_taxa() + species() + ecology()– 10,654 species with 6 mixed-type traits (Length, Weight, BodyShapeI, DepthRangeDeep, Vulnerability, Troph). Completes the vertebrate breadth triad: birds (AVONET), mammals (PanTHERIA), fish (FishBase + fishtree). - Full-scope result on Apple Silicon MPS (2.5 hr wall, post median- pool fix dc8cffa): Vulnerability RMSE 17.9 → 9.9 (-45%); Length RMSE 41.2 → 25.5 (-38%, r=0.81); BodyShapeI accuracy 0.46 → 0.78 (+31 pp); Troph RMSE -23%; DepthRangeDeep RMSE -10%; Weight RMSE -3% (r=0.42 vs 0.04 under mean pool).
- HTML generator:
script/make_bench_fishbase_html.R+ pkgdown mirror atpkgdown/assets/dev/bench_fishbase.html.
MI-pool robustness: median pooling for decode-amplifying traits
-
pool_imputations()now uses median pooling across decoded values for all trait types where the decode step amplifies dropout-noisy draws: log-transformed continuous (new), count (extended), zi_count (extended), and proportion (new). Untransformed continuous keeps mean pooling (backward-compat default). - Rationale: when
n_imputations > 1, a few dropout-noisy latent draws with high-tail values decode viaexpm1()/plogis()to absurd magnitudes that dominaterowMeans(). Median is the outlier-robust pooling choice that matches PR #41’s approach for count / proportion, now extended across the full set of decode-amplifying trait types. - Concrete example: on the FishBase n=10,654 bench, pigauto’s Length RMSE was 3,718 vs mean-baseline’s 41 under the old mean-pool scheme – a pure decode artefact (the calibrated gate on Length was 0). Median pool eliminates the artefact without retraining.
- Binary / categorical / ordinal pooling is unchanged (mean of probabilities / rowMeans of integer classes is correct there).
- 4 new regression tests in
test-mi-pool-robust.Rcover the four trait types that switched to median, verifying that a single extreme draw does not dominate the pooled output.
GPU memory fix at predict stage (large n)
-
fit_pigauto()now moves its returnedmodel_stateto CPU and callstorch::cuda_empty_cache()before returning, so the training run’s activations, optimizer state, and per-epoch intermediate tensors can be reclaimed by torch’s CUDA caching allocator. Prior to this fix, the fit object held GPU tensor refs that kept ~40 GB of training-time memory resident, and the downstreampredict()path OOM’d on any card with <= 80 GB at n >= 5000. -
impute()now callsgc()+torch::cuda_empty_cache()betweenfit_pigauto()andpredict()as belt-and-braces. - Reproducer: Vulcan L40S (46 GB) at n=5000 or n=9993, default or compact config; predicted in 382 MB / 192 MB / 762 MB allocations on a 43 GB-already-used GPU. All four configurations on 2026-04-21 died at the same step with the same root cause.
- No behavioural change for users: same
pigauto_fitshape,predict()output identical; just no longer OOM at scale.
Phase 7 EM: off-diagonal conditioning (opt-in, on top of Phase 6)
- New argument
em_offdiag = FALSEonimpute()andfit_baseline(). WhenTRUEANDem_iterations >= 2L, each binary/ordinal liability cell’s prior at iterationk + 1is the full conditional-MVN(mu, sd)given the posterior liability of other traits at iterationk. This means observing one discrete trait shifts (not just tightens) the prior on correlated other traits — the off-diagonal entries ofΣnow enter the E-step, instead of being ignored as in Phase 6. -
em_offdiag = FALSEdefault preserves Phase 6 behaviour byte-for-byte. -
em_offdiag = TRUEwithem_iterations < 2Lis silently clamped (iter 0 has no EM; iter 1 is plug-in with no previous Σ to condition on). - Scope: binary + ordinal only. OVR categorical stays on Phase 6 diagonal — going off-diagonal on OVR would re-couple the K classes and reintroduce the rank-(K − 1) singular-matrix instability that OVR was built to dodge.
- Missing-pattern handling: cells with all-NA other-trait posteriors fall back to the unconditional Phase 6 prior; cells with partial-NA use a restricted conditional on the observed subset.
-
em_stategainsem_offdiagfield.
Phase 6 EM for threshold-joint baseline (opt-in)
- New argument
em_iterations = 0Lonimpute()andfit_baseline(). When>= 1, the threshold-joint path (binary + ordinal + OVR categorical) iterates: the BM rateΣlearned byRphylopars::phylopars()at iterationkis fed back as the per-trait prior SD at iterationk + 1, up toem_iterationstimes (default 5 when enabled) or until||Σ_k - Σ_{k-1}||_F / ||Σ_{k-1}||_F < em_tol(default 1e-3). Closes the final open item from the v0.9.0 “Deferred to future releases” list. - Default
em_iterations = 0Lpreserves v0.9.1 output byte-for-byte. - Under independent-discrete data the EM path converges in 1-2 iters at the plug-in answer (no free lunch, no harm). Under correlated-discrete data (ρ >= 0.5), the prior tightens and discrete-trait accuracy typically gains a few pp.
- Full off-diagonal conditioning on
Σis deferred to a future Phase 7. Phase 6 usessqrt(diag(Σ))only.
pigauto 0.9.1 (2026-04-19)
pigauto 0.9.1 is a consolidation release on top of 0.9.0. Two headline improvements: (a) tree-sharing GNN is now the default in multi_impute_trees(), which makes posterior-tree multiple imputation cheap enough that the Nakagawa & de Villemereuil (2019) canonical workflow (T = 50 posterior trees × m_per_tree = 1 = M = 50 pooled datasets) is the default path at n = 10,000 species; (b) opt-in calibration smoothers (bootstrap conformal, median-over-splits gate, low-val-cells warning) for small-n regimes. This release also ships the three post-0.9.0 feature branches (B1 soft-liability, B2 rate-aware attention, B3 full-threshold ordinal baseline), scaling validation up to 10,000 species, and a clean R CMD check (0 errors / 0 warnings / 1 note, down from 1 / 4 / 3 on v0.9.0).
Calibration diagnostics and opt-in smoothing for small-n regimes
- New
fit_pigauto()argumentmin_val_cells = 10L. Warns at fit time if any trait had fewer than this many validation cells available for gate calibration and conformal-score estimation. Default of 10 is the floor of pathological territory (operational target is 20-30). - New optional
gate_method = "median_splits"runs the half-A / half-B grid search ongate_splits_B = 31Lrandom splits and takes the median best-gate. Small empirical benefit at smalln_val(gate SD 0.406 → 0.360 across 10 seeds on the calibration sim). Default stays"single_split"for backward compat. - New optional
conformal_method = "bootstrap"averages the conformal quantile acrossconformal_bootstrap_B = 500Lbootstrap resamples of the val residuals. Empirically reduces conformal-score variance ~30% but does not reliably improve 95%-interval coverage stability on the evaluation sim. Shipped as experimental; default stays"split". - Roxygen for
fit_pigauto()gains a@section Calibration at small n:explaining when the 95% intervals should be treated as approximate. Short version: at n_val < 20 the split-conformal guarantee degrades; increasemissing_fracor collect more species rather than relying on estimator smoothing.
Tree-sharing GNN is now the default in multi_impute_trees()
- New default
share_gnn = TRUEinmulti_impute_trees(). The GNN is trained once on a reference tree (MCC viaphangorn::maxCladeCred, fallbacktrees[[1]]) and reused across posterior trees, with only the BM baseline recomputed per tree. ~10-15x speedup at n=10k x T=50 makes the Nakagawa & de Villemereuil (2019) canonical workflow (T=50, m_per_tree=1, M=50 pooled via Rubin’s rules) the default path. - Default
m_per_treeflipped from5Lto1Lto align with the N&dV 2019 canonical workflow. Users withT < 20get a runtime warning suggesting they bumpm_per_treeto keep Rubin’s rules stable. - Opt-out:
share_gnn = FALSErestores the pre-v0.9.1 per-tree fit path for users who need exact per-tree model independence. - New optional arg
reference_treelets users override the MCC choice. -
predict.pigauto_fit()gains abaseline_overrideargument (mostly internal — used by the shared-GNN path). - New Suggests:
phangorn(formaxCladeCred()). - Tree-uncertainty propagation analysis: when the calibrated gate closes (the common case on real data — see the v0.9.0 validation suite), the shared-GNN approximation is lossless. When the gate is open, tree variance is slightly under-estimated in the GNN channel only — the baseline channel still carries it.
Full threshold-model ordinal baseline (B3)
- Ordinal traits now use proper interval-truncated Gaussian E-step in the Level-C baseline instead of z-scored integer passthrough. Observing “class 3 out of 5” constrains the liability to the interval [threshold_2, threshold_3] rather than treating it as a point value. The liability joins the joint Rphylopars fit alongside continuous + binary traits.
- New internal decoder:
decode_ordinal_liability()converts the Rphylopars liability posterior back to z-scored integer class for downstream GNN compatibility. -
fit_baseline()threshold-joint dispatch now fires when ordinal cols are present (in addition to binary / categorical). Ordinal cols that can’t be populated (e.g., <2 observations) fall back to per-column BM unchanged.
Rate-aware message passing via learnable per-head Gaussian bandwidth (B2)
-
GraphTransformerBlockattention bias now uses learnable per-head Gaussian bandwidth computed from raw squared cophenetic distances:bias_h = -D_sq / (2 * softplus(log_bw_h)^2). Each head can learn its own phylogenetic scale — tight for fast-evolving traits, broad for conserved traits. Replaces the fixedadj_bias_scale * log(adj)formulation. -
build_phylo_graph()returnsD_sq(squared cophenetic distances) alongsideadj. Persists through training (not freed likeD). - Backward compat: when
D_sq = NULL(pre-B2 saved fits), the oldlog(adj)path fires unchanged. Legacy attention path (use_transformer_blocks = FALSE) is completely unaffected.
Soft-liability E-step for multi-obs aggregation (B1)
- New argument
multi_obs_aggregation = c("hard", "soft")onimpute()andfit_baseline(). Default"hard"preserves v0.9.0 behaviour."soft"uses Rao-Blackwell convex-combination E-step for binary and categorical traits: a species with 6/10 class-1 observations now produces a less-extreme posterior liability than 10/10, fixing the threshold-at-0.5 / argmax-one-hot information loss from Phase 10. - New helpers:
estep_liability_binary_soft(p, mu_prior, sd_prior)andestep_liability_categorical_soft(p_vec, mu_prior, sd_prior)inR/liability.R. Both reduce to their hard counterparts at boundary (one-hot) proportions; both return the prior mean at maximum ambiguity (p = 0.5 for binary, uniform for categorical). -
aggregate_to_species()gainssoft_aggregateflag: when TRUE, preserves species-level proportions instead of thresholding/argmax. - Benchmark: binary accuracy consistently improves +1.4–2.4pp across all 3 phylogenetic-signal regimes. Categorical shows +2.0pp at high signal but mixed results at moderate/low signal (hard threshold sometimes snaps to the correct majority class by luck).
Scaling validation (up to 10,000 species)
-
AVONET 9,993 + BirdTree missingness sweep (
script/bench_avonet_missingness.R). Mean / mode vs BM baseline vs pigauto at 20% / 50% / 80% missing, run from Compute Canada (Narval). pigauto beats the BM baseline on every continuous trait at 80% missing; discrete accuracy stays within ≈1pp of BM across the sweep. Output:bench_avonet_missingness.rds+bench_avonet_missingness.md, rendered into the pkgdown validation suite. -
Scaling curve to n = 10,000 (
script/bench_scaling_v090_extended.R). Per-stage wall-clock + peak memory at n ∈ {100, 300, 1000, 3000, 5000, 7500, 10000}. Confirms sub-quadratic growth for the baseline and the expected O(n²) scaling of the GNN attention. At n = 10,000 the full pipeline completes in ~30–60 min on a single CPU node. Output: scaling table + curve PNG in the pkgdown validation suite.
Benchmarks
-
bench_covariate_sim.Rrerun on v0.9.0 (previously pinned to v0.6.0). Now the reference entry for the environmental-covariates path in the validation suite. -
bench_multi_obs_mixed.Routput regenerated with the B1 soft E-step path enabled; captures the +12–28pp categorical lift under soft aggregation vs hard.
Documentation
- New pkgdown article “Propagating Tree Uncertainty” (
vignettes/tree-uncertainty.Rmd). Decision guide for single-tree vs multi-tree MI, canonical N&dV 2019 workflow under the newshare_gnn = TRUEdefault, worked example over posterior trees withMap()-based downstream-model refitting, and a timing table. - pkgdown trait-type consistency pass:
zi_countandmulti_proportionare now listed everywhere alongside the other six trait types (Getting Started, Mixed Types, the validation suite, and the README).
Internal
-
R CMD checkclean: 0 errors / 0 warnings / 1 note, down from 1 / 4 / 3 on v0.9.0. The remaining note is theBACEin-tree “Package sources not in a standard location” flag, which is a deliberate.Rbuildignorechoice so BACE ships as a comparison reference without being part of the installed pigauto package. -
Compute Canada SLURM bundles.
submit_v090_cloud/(Narval; renamed clusters after the 2026 Alliance Canada infrastructure renewal — Cedar → Fir etc.) ships one-shot scripts for the AVONET missingness sweep and the scaling curve.submit_v090_vulcan/(PAICE accountaip-snakagaw) ships a SLURM array driver for the 60-cell calibration-coverage grid. -
pkgdown CI path filter: the pkgdown workflow now only rebuilds on changes to user-facing files (
R/,vignettes/,pkgdown/,README.md,NEWS.md,DESCRIPTION). Saves ~50% of the GitHub Actions minutes previously consumed by pigauto PR checks. -
Gap fixes: replaced a hardcoded worktree path in
bench_multi_obs_mixed.Rwith a portable relative path; refreshed the bench_multi_obs_mixed.rds.
pigauto 0.9.0 (2026-04-16)
pigauto 0.9.0 is a large release that ships two complementary upgrades: a Level-C phylogenetic baseline (captures cross-trait correlation and threshold structure), and a Graph Transformer GNN backbone (replaces the 2-layer single-head attention stack with a proper multi-head transformer). Both are gate-safe — pigauto still closes its per-trait gate when the baseline is already optimal, so high-phylo-signal datasets retain identical behaviour to v0.7.0.
Level-C baseline improvements
A new family of joint multivariate baselines replaces per-column Brownian motion + label propagation on datasets where cross-trait correlation carries signal.
Liability encoding (
R/liability.R). Every trait type now has an explicit liability interpretation: continuous types are their own liability; binary uses a 1D threshold at 0; ordinal uses K-1 thresholds; categorical uses K liabilities with an argmax constraint. A single dispatcherestep_liability(tm, observed, mu_prior, sd_prior)returns posterior mean/variance of the underlying liability given an observation. This is the foundation the joint paths below build on.Joint multivariate BM baseline (
R/joint_mvn_baseline.R). When Rphylopars is installed and the dataset has ≥2 BM-eligible latent columns, the baseline now fits a single joint multivariate BM across continuous, count, ordinal, proportion, and ZI-magnitude cols viaRphylopars::phylopars(), capturing phylogenetic correlation across traits. Benchmark on correlated simulated BM data: 33.7% RMSE lift vs per-column BM (bench_joint_baseline.R).Threshold-joint binary baseline (
R/joint_threshold_baseline.R). Binary traits get a truncated-Gaussian E-step posterior mean viaestep_liability_binaryand join the joint MVN alongside continuous traits. Decoded back tologit(P(y=1))so BCE and gate math are unchanged. Benchmark: 3/3 scenarios with ≥2pp accuracy lift over the LP baseline, peaking at +16pp at rho = 0.6 (bench_binary_joint.R).OVR categorical baseline (
R/ovr_categorical.R). Each K-class categorical trait is decomposed into K independent “is_class_k vs rest” binary threshold fits, then normalised into a row-stochastic distribution. This is the same OVR strategy BACE uses. On AVONET 300, this lifts Trophic.Level accuracy to 77% and Primary.Lifestyle to 84% — beating BACE-OVR’s 72% on both. The K-independent-fits path sidesteps the rank-(K-1) numerical instability that made earlier single-fit approaches unstable; each individual fit has only one categorical-related column.ZI-count gate routing. The binary gate col of zero-inflated count traits now joins the threshold-joint path alongside binaries (the magnitude col already rode the Phase-2 joint MVN).
Graceful fallback throughout. Any Level-C path that can’t fit (Rphylopars missing, too few observations per column, multi_proportion trait present) falls through to the legacy per-column BM + LP paths without user-visible breakage.
GNN improvements
Graph Transformer backbone (
R/graph_transformer_block.R). The 2-layer single-head attention stack is replaced byn_gnn_layersinstances of aGraphTransformerBlock— multi-head attention (default 4 heads) with learnable per-headlog(adj + eps)bias (so the phylo prior is respected by each head independently), feed-forward network (default 4× width), pre-norm, two residual skips. FFN output projection initialises to zero so each block starts near-identity, preserving pigauto’s gate-closed-at-init safety.New hyperparameters on
ResidualPhyloDAEandfit_pigauto():use_transformer_blocks = TRUE(new default),n_heads = 4L,ffn_mult = 4L. The legacy architecture is fully preserved behinduse_transformer_blocks = FALSEand reproduces pre-0.9.0 numbers exactly. Pre-0.9.0 savedpigauto_fitobjects reconstruct via the legacy path automatically.Empirical characterisation. On the 4-scenario discriminative benchmark the transformer and legacy GNN produce identical predictions in high-signal regimes (gate closes to zero — architecture below the gate is irrelevant) and the transformer shows small RMSE lift in moderate/low signal scenarios where the gate stays partly open. Runtime is ~25–30% slower per fit. Larger transformer gains likely need self-supervised pretraining on a large tree corpus (future work).
Multi-observation unlock
Multi-obs data now uses Level-C paths. Previously the joint MVN, threshold-joint, and OVR categorical paths refused multi-obs input via four
!multi_obsguards infit_baseline(). Those guards are all removed; multi-obs data is aggregated to species level inside each Level-C helper via a new internalaggregate_to_species()helper. Aggregation rules: mean for continuous-family, threshold-at-0.5 for binary/ZI-gate, argmax-one-hot for categorical. Splits are mask-then-aggregated so val/test leakage is impossible; single-obs data passes through unchanged.Benchmark (
bench_multi_obs_mixed.R). On a synthetic multi-obs mixed-type simulation (150 species, 2 continuous + 1 binary + 1 K=4 categorical, Poisson-5 obs per species), the Level-C path beats the LP path by +12–28pp categorical accuracy and +5–7.5pp binary accuracy across all three phylogenetic-signal regimes.Aggregation caveat. Binary/categorical aggregation is lossy — a species with split observations (e.g., 6/10 class-1) becomes a single class-1 observation at species level. For datasets with strong within-species variability on discrete traits, prefer single-obs mode with a user-computed species-level summary.
Benchmarks
Nine new or reran benchmark reports quantifying the release’s impact:
-
bench_joint_baseline.R— Phase-2 joint MVN A/B on correlated BM data. -
bench_binary_joint.R— Phase-3 threshold-joint A/B on binary traits. -
bench_discriminative.R+bench_discriminative_phase9.R— 4-scenario mixed-signal sweep, including the transformer vs legacy GNN A/B. -
bench_avonet_phase6.R— post-Phase-6 AVONET 300 vs LP baseline. -
bench_categorical_joint.R— OVR K-fits A/B on synthetic categorical data. -
bench_covariate_sim.R— rerun on 0.9.0. Shows −10.6% RMSE on the low-phylo-strong-env scenario vs pre-Phase-6. Pre-0.9.0 snapshot inbench_covariate_sim_preP6.md. -
bench_multi_obs.R— rerun on 0.9.0. Numbers unchanged from pre-0.9.0 (bench is single-trait so it can’t exercise Level-C dispatch conditions;bench_multi_obs_mixed.Ris the correct validator). -
bench_multi_obs_mixed.R— new synthetic mixed-type multi-obs benchmark specifically to validate Phase 10’s multi-obs unlock.
Internal
-
joint_mvn_available()gates the Level-C dispatch — returnsTRUEiff Rphylopars is installed. All Level-C paths fall back gracefully when it returnsFALSE. -
aggregate_to_species()is the sole multi-obs → species-level helper and is the right place to hook future soft-liability aggregation (currently threshold-at-0.5 for binary, argmax for categorical). - 20+ new test blocks across
test-liability.R,test-joint-threshold-baseline.R,test-joint-baseline.R,test-ovr-categorical.R,test-graph-transformer-block.R,test-phase9-integration.R, andtest-multiobs-levelc.R. Full suite is 778 PASS, 0 FAIL.
Deferred to future releases
- Soft-liability E-step for multi-obs aggregation (addresses the threshold-at-0.5 lossiness documented above).
- Rate-aware message passing in the transformer blocks.
- Full threshold-model ordinal baseline (ordinal currently rides the Phase-2 joint MVN via its z-scored integer encoding).
- Self-supervised pretraining of the transformer backbone on a large tree corpus.
- CRAN submission preparation (BACE in-tree separation, torch handling, absolute-path cleanup in
script/).
pigauto 0.7.0
New trait type: multi_proportion (compositional data)
multi_proportion: a new trait type for compositional data — K columns per row that sum to 1 (e.g. plumage-colour proportions, diet composition, microbiome relative abundances, allele frequencies). Declared via a newmulti_proportion_groups = list(colour = c("black", ..., "yellow"))argument toimpute(),preprocess_traits(), andmulti_impute().Encoding: centred log-ratio (CLR) + per-component z-score. Small epsilon (
1e-6) is applied to zero components before log to keep the transform safe; rows are re-normalised so the composition is preserved.Baseline: K independent Brownian-motion imputations on the CLR columns. BM is well-defined on CLR space (Euclidean), unlike the raw simplex.
GNN output: K logits projected to sum-zero, then softmax at decode time — guaranteed to lie on the simplex.
Loss: MSE on CLR columns (averaged across K), applied group-wise so the whole composition is masked together during DAE corruption.
Metrics: Aitchison distance (the natural compositional-data metric), RMSE on CLR space (comparable to continuous traits), and simplex MAE (mean absolute error on decoded proportions).
Benchmark:
script/bench_multi_proportion.Rwith 5-scenario signal sweep and K ∈ {3, 5, 8, 12} secondary sweep. Report rendered byscript/make_bench_multi_proportion_html.R.Tests:
tests/testthat/test-multi-proportion.R(5 test blocks, 20 expectations) — encoding, validation, baseline, end-to-endimpute().
Internal
evaluate_imputation()output adds three columns:aitchison,rmse_clr,simplex_mae. Existing metrics columns are unchanged.impute()gains explicittrait_typesandmulti_proportion_groupsarguments (previouslytrait_typeswas documented as passed through...but was not actually forwarded; this fixes that).
pigauto 0.6.2
Documentation
-
Tree uncertainty workflow clarified.
multi_impute_trees(),trees300, the README, andvignette("getting-started")now all describe the two-step workflow explicitly: step 1 is tree-aware imputation (pigauto’s job — already done correctly); step 2 is tree-aware downstream analysis (the user’s responsibility, following Nakagawa & de Villemereuil 2019, Syst. Biol. 68:632–641). Every doc now includes a completeMap()-over-mi$tree_indexcode example showing how to run the corresponding tree in the downstream model. -
Compute-cost table added to the tree-uncertainty docs: wall-clock budgets at
n= 300 / 5,000 / 10,000 species withT= 10 / 50 trees, plus guidance on reducingTfor large trees and parallelising across machines.
pigauto 0.6.1
Bug fixes
MC dropout zero variance fixed (
R/predict_pigauto.R): whenn_imputations > 1, the blend equation now uses a BM posterior drawt_BM_draw ~ N(BM_mu, BM_se)instead of the deterministict_MU. When the calibrated gate is zero (BM dominates), each imputation now samples from the correct BM posterior rather than returning the same point estimate, giving non-zero between-imputation variance.Conformal draw scale bug fixed (
R/multi_impute.R): for log-transformed traits (Mass, Wing length, etc.), draws are now made on the latent z-score scale and back-transformed, avoiding near-zero variance caused by dividing a log-scale SE by an original-scale value. The same fix applies to log1p-scaled counts and logit-scaled proportions.
New features
-
draws_method = "conformal"is now the default formulti_impute(). Draws sample from the split-conformal calibrated uncertainty distribution (better calibrated than MC dropout for downstream Rubin’s-rules pooling). -
draws_method = "mc_dropout"remains available and is now correct (see bug fix above).
pigauto 0.6.0
Observation-level covariate refinement for multi-obs data
When datasets have multiple observations per species measured under different conditions (e.g. CTmax at different acclimation temperatures), the GNN now produces covariate-conditional predictions within species. A new refinement MLP (obs_refine) re-injects user-supplied covariates after species-level phylogenetic message passing, so that different observations of the same species receive different predicted values based on their covariate context. This is controlled by a residual connection that preserves the phylogenetic signal from message passing.
-
R/model_residual_dae.R: newn_user_covparameter andobs_refineMLP (activated only whenspecies_col+covariatesare both supplied) -
R/fit_pigauto.R:n_user_covstored in model config -
R/predict_pigauto.R: backward-compatible model reconstruction
New bundled dataset: ctmax_sim
Simulated multi-observation-per-species CTmax data (1,464 observations across 300 species, with acclimation temperature as an observation-level covariate). 30% of species are entirely unobserved. Uses tree300. See data-raw/make_ctmax_sim.R for the generation script.
New benchmark: multi-observation imputation
script/bench_multi_obs.R simulates CTmax-like datasets under varying phylogenetic signal and within-species acclimation response ratios, comparing species_mean, pigauto (no covariates), and pigauto (with covariates). The observation-level refinement gives measurable RMSE improvement when within-species covariate effects are strong.
pigauto 0.5.0
Three sources of information
pigauto now explicitly combines three sources of information for imputation: (1) the phylogenetic tree, (2) cross-trait correlations, and (3) optional environmental covariates. The covariates argument is accepted by impute(), multi_impute(), multi_impute_trees(), and the lower-level pipeline functions. Covariates are threaded through the GNN with the same gated safety that protects against GNN degradation — they only contribute when they demonstrably improve accuracy.
Internal BM baseline (Rphylopars no longer required)
The Brownian-motion baseline for continuous, count, ordinal, and proportion traits is now computed internally using conditional multivariate normal imputation on the phylogenetic correlation matrix R = cov2cor(vcv(tree)) (Goolsby et al. 2017). This removes the hard dependency on Rphylopars, which is now in Suggests only. The internal implementation (R/bm_internal.R) uses univariate per-column imputation with Cholesky decomposition and nugget regularisation for near-singular submatrices. Validation against Rphylopars shows r = 0.97–0.98 (expected given univariate vs multivariate difference).
Tree-uncertainty MI via multi_impute_trees()
New function multi_impute_trees() performs multiple imputation across a posterior sample of phylogenetic trees, so that phylogenetic uncertainty propagates into downstream standard errors via Rubin’s rules. Demonstrated with the bundled trees300 dataset (10 BirdTree posterior trees). Benchmark results show SE inflation of 1.1–2.1x and fraction of missing information (FMI) rising from ~0.03 (single-tree) to 0.22–0.79 (multi-tree) depending on missingness level.
New bundled datasets
-
trees300: 10 posterior trees from the BirdTree Hackett backbone, pruned to the same 300 tips astree300 -
delhey5809: plumage lightness and 4 environmental covariates (absolute latitude, mean temperature, mean precipitation, tree cover) for 5,809 bird species (Delhey 2019) -
tree_delhey: matching BirdTree phylogeny fordelhey5809
Benchmark suite
Thirteen benchmark reports are now accessible from the pkgdown site under Articles > Benchmarks:
- Per-type benchmarks: continuous, binary, ordinal, count, categorical, proportion, zero-inflated counts
- Missingness mechanisms: MCAR vs MAR (trait/phylo) vs MNAR
- Tree uncertainty: single-tree vs multi-tree MI
- Environmental covariates: Delhey 5,809-species real data (high phylo signal, covariates give ~zero lift) and simulation sweep (4 lambda x beta scenarios; 8–15% RMSE reduction when env effects are strong, ~0% when phylo signal dominates)
Each report is a self-contained HTML page generated from script/make_bench_*_html.R and backed by reproducible script/bench_*.R drivers.
pigauto 0.4.0
Breaking: TabPFN baseline removed
fit_baseline_tabpfn() and setup_tabpfn() are gone, along with the reticulate suggested dependency, the script/bench_tabpfn.* files, script/run_tabpfn.py, tests/testthat/test-tabpfn.R, and the dev/bench_tabpfn.html benchmark page. The TabPFN wrapper was originally added as a curiosity — a test of whether a general-purpose tabular transformer could compete with pigauto as a drop-in baseline. In practice it is a poor fit for what pigauto is trying to do:
- It is not phylogenetic. TabPFN treats rows as exchangeable and cannot use the tree at all. The whole point of pigauto is that the tree carries information about trait evolution; feeding traits through a tree-blind model throws that away.
-
Quadratic attention does not scale. TabPFN’s attention is O(n²) in the number of rows. At
n = 9,993species (the full AVONET dataset) this is already near the edge of what a single GPU can handle, and most users of pigauto care precisely about scales at which a baseline must be cheap. -
The cross-language subprocess architecture was fragile. R
torchand Pythontorchsharelibtorchand cannot coexist in the same process, so TabPFN had to run viasystem2()on a separately-installed Python venv. Every time the user’s Python,tabimpute, ortorchversions drifted, the wrapper broke. This is not maintenance the project can afford. - Keeping the benchmark page gave a misleading impression of relevance. Users who hit the “pigauto vs TabPFN” page came away thinking TabPFN was a first-class supported baseline in pigauto. It never was.
Users who still want to benchmark pigauto against TabPFN can install tabimpute separately and call it directly from Python. The removal simplifies pigauto’s installation, CI, and documentation.
Docs cleanup (carried over from v0.3.3 work)
-
DOCS.md: rewritten benchmark and developer-reports sections to reflect the v0.3.2+ benchmark set (scaling to 10,000 tips and the AVONET missingness sweep). Dropped the obsoletebenchmark_reportandbenchmark_final_reportlinks that v0.3.2 had already pruned from_pkgdown.ymlbut had not yet removed fromDOCS.md. -
CLAUDE.md: removed the TabPFN Python-venv setup recipe and the “R torch ↔︎ Python torch cannot share a process” gotcha (no longer applicable). Version line bumped to 0.4.0.
pigauto 0.3.3
Correctness fix: Vphy must be a correlation matrix in glmmTMB’s propto()
The multiple-imputation example in README.md, the two HTML tutorials (pigauto_intro.html, pigauto_workflow_mixed.html), and tests/testthat/test-multi-impute.R all constructed the phylogenetic random-effect structure with
Vphy <- ape::vcv(tree) # WRONG: covariance, diag = tree height
glmmTMB(... + propto(0 + species | dummy, Vphy), ...)propto() estimates sigma^2 freely, so passing the raw covariance matrix (whose diagonal is the tree height) silently rescales the phylogenetic variance component by tree height. The correct usage passes a correlation matrix:
All four sites are now fixed. Thanks to the user who flagged this.
Honest framing: the GNN is a gated ensemble, not a residual model
The user-facing prose throughout the package historically described pigauto as a Residual Phylogenetic Denoising Autoencoder (ResidualPhyloDAE) that learns a residual correction on top of a BM baseline. This wording was misleading for binary and categorical traits in two ways:
- The “baseline” for those types is phylogenetic label propagation (a similarity-weighted average of observed labels using a Gaussian kernel on the cophenetic distance) — not a generalised linear model that produces residuals on a link scale.
- The GNN is trained end-to-end by minimising the type-appropriate loss (MSE for continuous/count/ordinal, BCE for binary, cross-entropy for categorical) on the blended output, not on
y - baseline. So the GNN’s outputdeltais a full per-cell prediction, not a statistical residual.
What pigauto actually is, in honest terms, is a gated ensemble of two predictors:
- A phylogenetic baseline (Brownian motion via Rphylopars for continuous/count/ordinal traits; phylogenetic label propagation for binary/categorical traits).
- A graph neural network correction (an internal torch module that produces a full per-cell prediction
deltafrom spectral node features and an attention-biased message passing over the phylogeny).
Prediction is the per-trait blend (1 - r_cal) * baseline + r_cal * delta, where r_cal is calibrated on a held-out validation split (with a split-validation cross-check for discrete traits). When the baseline is already optimal the gate closes and the GNN becomes a no-op.
The internal torch class name ResidualPhyloDAE is preserved because its GNN layers use ResNet-style residual skip connections — that is a correct, well-defined ML term. But user-facing prose no longer describes the GNN as “learning a residual on top of the baseline”.
Files updated: README.md, DESCRIPTION, CLAUDE.md, _pkgdown.yml, DOCS.md, vignettes/getting-started.Rmd, vignettes/mixed-types.Rmd, R/fit_pigauto.R (roxygen), R/predict_pigauto.R (roxygen), R/model_residual_dae.R (clarifying header comment), R/simulate_traits.R, R/plot.R (plot titles), and script/make_intro_html.R.
Tracking issue for v0.4.0 architectural work
The honest fix above is a documentation correction, not a model change. A genuinely residual baseline for binary and ordinal traits would require a liability-scale model (threshold BM, phylolm::phyloglm, or MCMCglmm with family = "threshold") — a design decision that should be made deliberately rather than rolled into a same-day patch. A v0.4.0 tracking issue lays out the trade-offs.
pigauto 0.3.2
New bundled dataset: avonet_full / tree_full
The full AVONET3 + BirdTree Stage2 Hackett MCC phylogeny, aligned to 9,993 species with 7 mixed-type traits (4 continuous morphometric + 2 categorical + 1 ordinal) and native missingness preserved. Complements the 300-species avonet300 / tree300 bundled data:
-
avonet300/tree300remain the right choice for quick examples, unit tests, and the getting-started vignette. -
avonet_full/tree_fullare the right choice for scale benchmarks, realistic-missingness experiments, and anything that wants to exercise the v0.3.1 sparse Lanczos / cophenetic caching code paths on a real-world phylogeny.
The schemas are identical, so any code that runs on avonet300 runs on avonet_full with no modification. Combined tarball increment is ~328 KB (xz-9).
New benchmark: AVONET missingness sweep (20 / 50 / 80%)
Articles -> Benchmarks -> AVONET missingness sweep is a new pkgdown page comparing three methods on the full avonet_full dataset at three MCAR missingness levels:
- mean / mode — column mean (continuous, ordinal) or class frequency (categorical). No phylogeny.
- BM baseline — Rphylopars Brownian-motion for continuous / ordinal, phylogenetic label propagation for discrete traits.
-
pigauto — full pipeline (BM baseline + calibrated GNN delta
- conformal intervals).
Per-trait RMSE, Pearson r, Spearman rho, and accuracy on held-out test cells. Single seed per cell, trait-level masking so categorical traits are held out as whole groups. Hyperparameters match the validate_avonet_full.R scaling run for comparability.
Docs cleanup
- Regenerated
pigauto_introandpigauto_workflow_mixedHTML tutorials against the v0.3.2 package state. - Removed stale v0.3.0-era benchmark pages from the navbar (
benchmark_report.html,benchmark_final_report.html,test_report.html) and frompkgdown/assets/dev/. - Pruned 45 orphan files from
script/(trial artefacts from pre-v0.3.1 benchmarking phases:bench_v2/v3/v4.*,benchmark_avonet2000.*,benchmark_final.*,benchmark_simulation.*, scaling duplicates, and Apr-6 demo artefacts). -
README.mdcitation now points at 0.3.2. -
DESCRIPTIONupdated to mention both bundled datasets.
pigauto 0.3.1
Scaling to 10,000 tips
pigauto now runs end-to-end on real 10,000-tip phylogenies. Before this release the pipeline was tested and tuned around the 300-species bundled AVONET data, and at 10,000 tips it wedged on a single dense eigendecomposition of the graph Laplacian (hours of CPU time). Two targeted fixes land in this release; no rewrite was needed.
Fix A: sparse Lanczos eigensolver for build_phylo_graph()
build_phylo_graph() now uses RSpectra::eigs_sym() sparse Lanczos to compute the k + 1 smallest Laplacian eigenvectors when the tree has more than 7,500 tips and RSpectra is installed. This replaces base::eigen(L, symmetric = TRUE), which was O(n^3) in time and discarded >99% of its work (we only need the bottom ~32 eigenvectors).
- On the real 9,993-species AVONET + BirdTree Stage2 Hackett MCC phylogeny, the graph stage now takes under a minute, compared with roughly seven minutes of dense eigen() on the same hardware.
- The old dense path is kept as the default for smaller trees and as a safety fallback for when RSpectra is unavailable or Lanczos does not converge. The threshold is conservative (7,500) because highly symmetric ultrametric simulations (
ape::rcoal) have degenerate spectra that need large Krylov subspaces to resolve. -
RSpectra is listed in
Suggests. Installing it is strongly recommended for any tree larger than ~7,500 tips.
Fix B: cache the cophenetic distance matrix across the pipeline
The cophenetic distance matrix D = ape::cophenetic.phylo(tree) was previously computed four separate times in a single impute() call (three times inside build_phylo_graph() and once inside fit_baseline()). Each call allocates a dense n*n double matrix and runs an O(n^2) tree traversal – wasted work at scale. In 0.3.1:
-
build_phylo_graph()computesDexactly once and returns it as part of the result list (graph$D). -
fit_baseline()gains an optionalgraphargument; when supplied, it reusesgraph$Dinstead of recomputing the cophenetic matrix. -
impute()andfit_pigauto()pass the graph through automatically, so users get the speedup without any code changes. -
impute()also explicitly releasesgraph$Dafterfit_baseline()finishes, so the ~800 MB cophenetic matrix does not sit in R memory during training (a subtle side-effect that caused a large per-epoch regression at n = 10,000 until it was dropped).
Scaling benchmark
See pkgdown/assets/dev/scaling.html (linked from the site as Articles -> Benchmarks -> Scaling to 10,000 tips) for a side-by-side comparison of v0.3.0 vs v0.3.1 at n in {300, 1k, 2k, 3k, 5k, 7.5k, 10k}, plus an end-to-end validation run on the full AVONET3 + BirdTree dataset (9,993 species, 7 mixed-type traits). The pkgdown article also walks through both a quick 300-species example with the bundled avonet300 data and a full 10k-species example using the BirdTree Stage2 Hackett MCC phylogeny.
Bug fixes
-
build_phylo_graph(): the N > 2000 memory warning was out of date (the real blocker above that size was wall-clock time, not memory). The warning is now issued at N > 10000 and its text points at the true bottleneck.
pigauto 0.3.0
Major new features
Attention-based message passing
The GNN now uses learned attention weights (with phylogenetic adjacency as positional bias) instead of fixed adjacency matrix multiplication. The model learns which phylogenetic neighbours are most informative for each species, while the log-adjacency bias ensures initialisation respects evolutionary distances. Controlled via use_attention = TRUE (now the default).
Validation-calibrated gates
After training, per-trait gate values are optimised on the validation set via grid search. This prevents the GNN from adding noise when the Brownian motion baseline is already optimal. On the AVONET benchmark, this eliminates the RMSE regression seen in earlier versions.
Conformal prediction intervals
Distribution-free 95% prediction intervals computed via split conformal prediction on the validation residuals. Coverage is guaranteed under exchangeability. Available for continuous, count, and ordinal traits via pred$conformal_lower and pred$conformal_upper.
Phylogenetic label propagation
Binary and categorical trait baselines now use phylogenetic similarity- weighted label propagation instead of flat population-level frequencies. Each species gets a personalised baseline reflecting its phylogenetic neighbourhood.
Adaptive spectral encoding
k_eigen now defaults to "auto", scaling with tree size: min(max(ceil(n/20), 4), 32). This gives 4 features for small trees, 15 for 300 tips, and 32 for 640+ tips.
Auto-generated HTML reports
pigauto_report() produces self-contained HTML reports with interactive Chart.js visualisations: per-trait metrics, calibrated gate values, training history, and conformal coverage.
Evaluation framework
-
evaluate(): per-trait metrics on held-out data -
compare_methods(): automated BM vs GNN comparison -
cross_validate(): proper k-fold cross-validation -
summary.pigauto_fit(): formatted console summary
Plotting
-
plot.pigauto_fit(): training history, gate values, conformal scores -
plot.pigauto_pred(): scatter plots, conformal intervals, probability distributions -
plot_comparison(): forest-plot style method comparison
Simulation benchmark
simulate_benchmark() runs a complete simulation study with configurable scenarios (BM, OU, regime shift, non-linear, mixed types), returning tidy results with print/summary/plot methods. Useful for methods papers and for verifying pigauto’s behaviour on data with known properties.
Minor improvements
- Categorical gate calibration now works at the trait level (all K one-hot columns share a single gate) using cross-entropy loss, not per-column MSE
-
build_phylo_graph()defaultk_eigenchanged from8Lto"auto" -
fit_pigauto()defaultk_eigenchanged from8Lto"auto" -
all_grads_finite()now handles non-standard tensor types gracefully -
impute()uses adaptive k_eigen by default - Package passes
R CMD checkwith 0 errors and 0 notes - Bumped version to 0.3.0
