Ordinal-trait benchmark: level count sweep

Tree: ape::rtree(300) · Traits: 3 ordinal per scenario · Levels: 3 – 10 · Methods: median · BM baseline (Rphylopars) · pigauto · Replicates: 5 · Missingness: 25% MCAR · Commit 794537121b · Report generated 2026-05-30 12:04 · Total wall: 63.3 min

Bottom line. With 3 ordinal levels the BM baseline achieves RMSE 0.768 and pigauto achieves 0.765. The coarse discretisation limits the scope for GNN improvement.

With more ordinal levels there is finer-grained variation; compare pigauto against BM scenario by scenario.

Primary sweep: performance by number of ordinal levels (25% missingness)

Average across 3 traits and 5 replicates. ★ marks the best method per scenario.

LevelsRMSE (lower is better)Spearman ρ (higher is better)
MedianBMpigautoMedianBMpigauto
3 levels1.0170.7680.765 0.675 0.675
5 levels0.9940.782 0.7870.629 0.623
7 levels0.9780.7220.720 0.684 0.684
10 levels
RMSE by number of ordinal levels 0.00 0.29 0.58 0.88 1.17 RMSE 1.017 0.768 0.765 3 levels 0.994 0.782 0.787 5 levels 0.978 0.722 0.720 7 levels 10 levels Number of ordinal levels Median imputation BM baseline (Rphylopars) pigauto (BM + GNN)
Spearman ρ by number of ordinal levels 0.52 0.58 0.63 0.68 0.73 Spearman ρ 0.675 0.675 3 levels 0.629 0.623 5 levels 0.684 0.684 7 levels 10 levels Number of ordinal levels Median imputation BM baseline (Rphylopars) pigauto (BM + GNN)

Secondary sweep: signal strength (5 levels)

RMSE by phylogenetic signal at fixed 5 ordinal levels.

RMSE by phylogenetic signal (5 levels) 0.00 0.29 0.59 0.88 1.18 RMSE 0.979 1.024 0.983 Signal = 0.3 0.988 0.904 0.918 Signal = 0.6 1.012 0.541 0.544 Signal = 1.0 Phylogenetic signal Median imputation BM baseline (Rphylopars) pigauto (BM + GNN)

What the benchmark shows

Reproducibility

Driver: script/bench_ordinal.R. Tree: ape::rtree(300). Traits: simulate_ordinal_traits(). Training: 500 epochs with early stopping. To reproduce: Rscript script/bench_ordinal.R, then Rscript script/make_bench_ordinal_html.R.