Bottom line. At high signal (1.0), the BM baseline achieves RMSE 0.487 and pigauto achieves 0.507. Strong phylogenetic structure means the baseline captures most of the variation, and the calibrated gate stays near zero.
At low signal (0.2), baseline RMSE rises to 1.117 and pigauto achieves 1.069. With weak phylogenetic structure there is limited information for any method, but the GNN can still exploit cross-trait correlations.
Primary sweep: performance by phylogenetic signal (25% missingness)
Average across 4 traits and 5 replicates. ★ marks the best method per scenario.
Signal
RMSE (lower is better)
Pearson r (higher is better)
Mean
BM
pigauto
Mean
BM
pigauto
Signal = 0.2
1.023 ★
1.117
1.069
–
0.188 ★
0.102
Signal = 0.4
1.005
1.003 ★
1.025
–
0.368 ★
0.231
Signal = 0.6
1.013
0.842 ★
0.882
–
0.586 ★
0.502
Signal = 0.8
0.986
0.715 ★
0.742
–
0.698 ★
0.660
Signal = 1.0
0.969
0.487 ★
0.507
–
0.857 ★
0.841
Secondary sweep: boundary density (signal = 0.6)
Proportion of values near 0 or 1. Higher boundary density makes the bounded [0,1] constraint more relevant.
What the benchmark shows
Proportions are modelled on a transformed latent scale. The pipeline logit-transforms bounded values and z-scores that latent column before applying the BM baseline and GNN correction; decoding returns predictions to the [0,1] scale.
Signal strength dominates performance. At high phylogenetic signal the BM baseline captures much of the variation because closely related species have similar proportions.
Boundary density affects difficulty. When many values cluster near 0 or 1, the distribution is skewed and the transformed latent scale can still be challenging. The table should be used to judge whether the GNN correction helps in each cell.
The v0.10 rerun is cautious evidence for proportions. In the primary signal sweep shown here, pigauto stays in the neighbourhood of the BM baseline but trails it on average. Treat this page as a measured benchmark, not an improvement claim.
Reproducibility
Driver: script/bench_proportion.R. Tree: ape::rtree(300). Traits: simulate_proportion_traits(). Training: 500 epochs with early stopping. To reproduce: Rscript script/bench_proportion.R, then Rscript script/make_bench_proportion_html.R.