Continuous-trait benchmark: BM, OU, regime shift, nonlinear

Tree: ape::rtree(300) · Traits: 4 continuous per scenario · Models: BM, OU (α = 2), regime shift, nonlinear · Methods: mean · BM baseline · pigauto · Replicates: 5 · Missingness: 25% MCAR (primary) · Commit 794537121b · Report generated 2026-05-30 12:04 · Total wall: 135.5 min

Bottom line. Under pure Brownian motion the BM baseline is near-optimal (RMSE 0.469) and pigauto stays close to it (0.478) — the calibrated gate is expected to stay near zero when the baseline is already the true model.

Across OU, regime shift, and nonlinear models, pigauto stays close to the BM baseline with RMSE deltas of +7.8%, -1.6%, and -6.9% respectively. Read these as scenario-specific deltas, not a general dominance claim.

Primary sweep: evolutionary model comparison (25% missingness)

Average across 4 traits and 5 replicates. ★ marks the best method per scenario.

ModelRMSE (lower is better)Pearson r (higher is better)
MeanBMpigautoMeanBMpigauto
BM0.9970.469 0.4780.876 0.869
OU (α = 2)1.021 1.1231.0350.158 0.030
Regime shift0.9880.360 0.3660.926 0.923
Nonlinear0.9980.608 0.6500.796 0.772
Average RMSE by evolutionary model 0.00 0.32 0.65 0.97 1.29 RMSE (latent z-score) 0.997 0.469 0.478 BM 1.021 1.123 1.035 OU (α = 2) 0.988 0.360 0.366 Regime shift 0.998 0.608 0.650 Nonlinear Evolutionary model Mean imputation BM baseline pigauto (BM + GNN)
Average Pearson r by evolutionary model 0.00 0.25 0.50 0.75 1.00 Pearson r 0.876 0.869 BM 0.158 0.030 OU (α = 2) 0.926 0.923 Regime shift 0.796 0.772 Nonlinear Evolutionary model Mean imputation BM baseline pigauto (BM + GNN)

Secondary sweep: RMSE vs missingness (BM + OU)

How each method degrades as the held-out fraction increases. Average across traits and replicates.

BM 0.00 0.28 0.56 0.84 1.12 RMSE (latent z-score) 15% 30% 50% Missingness Mean imputation BM baseline pigauto (BM + GNN)
OU (α = 2) 0.00 0.31 0.62 0.93 1.25 RMSE (latent z-score) 15% 30% 50% Missingness Mean imputation BM baseline pigauto (BM + GNN)

What the benchmark shows

Reproducibility

Driver: script/bench_continuous.R. Tree: ape::rtree(300) with per-cell seeds rep × 100 + scenario_index. Traits: simulate_bm_traits() or simulate_non_bm() (4 traits per scenario). Training: 500 epochs with early stopping. To reproduce: Rscript script/bench_continuous.R, then Rscript script/make_bench_continuous_html.R.