Tree: ape::rtree(300) ·
Traits: 4 proportion per scenario ·
Signal: 0.2 – 1.0 ·
Methods: mean · BM baseline (Rphylopars) · pigauto ·
Replicates: 5 ·
Missingness: 25% MCAR ·
Commit c48a42ce6a ·
Run on 2026-05-11 10:54 ·
Total wall: 17.7 min
Bottom line. At high signal (1.0), the BM baseline achieves RMSE 0.490 and pigauto achieves 0.491. Strong phylogenetic structure means the baseline captures most of the variation, and the calibrated gate stays near zero.
At low signal (0.2), baseline RMSE rises to 1.134 and pigauto achieves 1.038. With weak phylogenetic structure there is limited information for any method, but the GNN can still exploit cross-trait correlations.
Primary sweep: performance by phylogenetic signal (25% missingness)
Average across 4 traits and 5 replicates. ★ marks the best method per scenario.
Signal
RMSE (lower is better)
Pearson r (higher is better)
Mean
BM
pigauto
Mean
BM
pigauto
Signal = 0.2
1.023 ★
1.134
1.038
–
0.183
0.197 ★
Signal = 0.4
1.005 ★
1.026
1.011
–
0.355 ★
0.337
Signal = 0.6
1.013
0.854 ★
0.877
–
0.576
0.588 ★
Signal = 0.8
0.986
0.715
0.710 ★
–
0.701 ★
0.700
Signal = 1.0
0.969
0.490 ★
0.491
–
0.856 ★
0.855
Secondary sweep: boundary density (signal = 0.6)
Proportion of values near 0 or 1. Higher boundary density makes the bounded [0,1] constraint more relevant.
What the benchmark shows
Proportions are treated as continuous traits in latent space. The pipeline z-scores proportions directly. The [0,1] boundary is respected at prediction time via clamping after back-transformation.
Signal strength dominates performance. At high phylogenetic signal the BM baseline captures most of the variation because closely related species have similar proportions. The GNN gate stays near zero.
Boundary density affects difficulty. When many values cluster near 0 or 1, the distribution is skewed and BM (which assumes Gaussian residuals) can struggle. The GNN can learn the non-linear boundary effects.
pigauto is usually close to the baseline and sometimes improves it. Average RMSE gains are modest in this run, and individual traits can tie or slightly trail the BM baseline.
Reproducibility
Driver: script/bench_proportion.R. Tree: ape::rtree(300). Traits: simulate_proportion_traits(). Training: 500 epochs with early stopping. To reproduce: Rscript script/bench_proportion.R, then Rscript script/make_bench_proportion_html.R.