Ordinal-trait benchmark: level count sweep

Tree: ape::rtree(300) · Traits: 3 ordinal per scenario · Levels: 3 – 10 · Methods: median · BM baseline (Rphylopars) · pigauto · Replicates: 5 · Missingness: 25% MCAR · Commit c48a42ce6a · Run on 2026-05-11 10:56 · Total wall: 15.4 min

Bottom line. With 3 ordinal levels the BM baseline achieves RMSE 0.840 and pigauto achieves 0.826. The coarse discretisation limits the scope for GNN improvement.

With more ordinal levels there is finer-grained variation; compare pigauto against BM scenario by scenario.

Primary sweep: performance by number of ordinal levels (25% missingness)

Average across 3 traits and 5 replicates. ★ marks the best method per scenario.

LevelsRMSE (lower is better)Spearman ρ (higher is better)
MedianBMpigautoMedianBMpigauto
3 levels1.0170.8400.826 0.648 0.648
5 levels0.9940.809 0.8130.622 0.619
7 levels0.9780.751 0.7520.6630.664
10 levels
RMSE by number of ordinal levels 0.00 0.29 0.58 0.88 1.17 RMSE 1.017 0.840 0.826 3 levels 0.994 0.809 0.813 5 levels 0.978 0.751 0.752 7 levels 10 levels Number of ordinal levels Median imputation BM baseline (Rphylopars) pigauto (BM + GNN)
Spearman ρ by number of ordinal levels 0.52 0.57 0.62 0.67 0.71 Spearman ρ 0.648 0.648 3 levels 0.622 0.619 5 levels 0.663 0.664 7 levels 10 levels Number of ordinal levels Median imputation BM baseline (Rphylopars) pigauto (BM + GNN)

Secondary sweep: signal strength (5 levels)

RMSE by phylogenetic signal at fixed 5 ordinal levels.

RMSE by phylogenetic signal (5 levels) 0.00 0.30 0.60 0.90 1.20 RMSE 0.979 1.042 0.978 Signal = 0.3 0.988 0.916 0.900 Signal = 0.6 1.012 0.592 0.586 Signal = 1.0 Phylogenetic signal Median imputation BM baseline (Rphylopars) pigauto (BM + GNN)

What the benchmark shows

Reproducibility

Driver: script/bench_ordinal.R. Tree: ape::rtree(300). Traits: simulate_ordinal_traits(). Training: 500 epochs with early stopping. To reproduce: Rscript script/bench_ordinal.R, then Rscript script/make_bench_ordinal_html.R.