Proportion-trait benchmark: signal strength sweep

Tree: ape::rtree(300) · Traits: 4 proportion per scenario · Signal: 0.2 – 1.0 · Methods: mean · BM baseline (Rphylopars) · pigauto · Replicates: 5 · Missingness: 25% MCAR · Commit c48a42ce6a · Run on 2026-05-11 10:54 · Total wall: 17.7 min

Bottom line. At high signal (1.0), the BM baseline achieves RMSE 0.490 and pigauto achieves 0.491. Strong phylogenetic structure means the baseline captures most of the variation, and the calibrated gate stays near zero.

At low signal (0.2), baseline RMSE rises to 1.134 and pigauto achieves 1.038. With weak phylogenetic structure there is limited information for any method, but the GNN can still exploit cross-trait correlations.

Primary sweep: performance by phylogenetic signal (25% missingness)

Average across 4 traits and 5 replicates. ★ marks the best method per scenario.

SignalRMSE (lower is better)Pearson r (higher is better)
MeanBMpigautoMeanBMpigauto
Signal = 0.21.023 1.1341.0380.1830.197
Signal = 0.41.005 1.0261.0110.355 0.337
Signal = 0.61.0130.854 0.8770.5760.588
Signal = 0.80.9860.7150.710 0.701 0.700
Signal = 1.00.9690.490 0.4910.856 0.855
RMSE by phylogenetic signal 0.00 0.33 0.65 0.98 1.30 RMSE 1.023 1.134 1.038 Signal = 0.2 1.005 1.026 1.011 Signal = 0.4 1.013 0.854 0.877 Signal = 0.6 0.986 0.715 0.710 Signal = 0.8 0.969 0.490 0.491 Signal = 1.0 Phylogenetic signal Mean imputation BM baseline (Rphylopars) pigauto (BM + GNN)
Pearson r by phylogenetic signal 0.08 0.29 0.49 0.70 0.91 Pearson r 0.183 0.197 Signal = 0.2 0.355 0.337 Signal = 0.4 0.576 0.588 Signal = 0.6 0.701 0.700 Signal = 0.8 0.856 0.855 Signal = 1.0 Phylogenetic signal Mean imputation BM baseline (Rphylopars) pigauto (BM + GNN)

Secondary sweep: boundary density (signal = 0.6)

Proportion of values near 0 or 1. Higher boundary density makes the bounded [0,1] constraint more relevant.

RMSE by boundary density (signal = 0.6) 0.00 0.31 0.63 0.94 1.26 RMSE 0.980 0.812 0.821 Boundary = 0.0 0.996 0.977 0.976 Boundary = 0.1 0.981 1.095 0.993 Boundary = 0.3 Boundary density Mean imputation BM baseline (Rphylopars) pigauto (BM + GNN)

What the benchmark shows

Reproducibility

Driver: script/bench_proportion.R. Tree: ape::rtree(300). Traits: simulate_proportion_traits(). Training: 500 epochs with early stopping. To reproduce: Rscript script/bench_proportion.R, then Rscript script/make_bench_proportion_html.R.