Tree: ape::rtree(300) ·
Traits: 3 ordinal per scenario ·
Levels: 3 – 10 ·
Methods: median · BM baseline (Rphylopars) · pigauto ·
Replicates: 5 ·
Missingness: 25% MCAR ·
Commit c48a42ce6a ·
Run on 2026-05-11 10:56 ·
Total wall: 15.4 min
Bottom line. With 3 ordinal levels the BM baseline achieves RMSE 0.840 and pigauto achieves 0.826. The coarse discretisation limits the scope for GNN improvement.
With more ordinal levels there is finer-grained variation; compare pigauto against BM scenario by scenario.
Primary sweep: performance by number of ordinal levels (25% missingness)
Average across 3 traits and 5 replicates. ★ marks the best method per scenario.
Levels
RMSE (lower is better)
Spearman ρ (higher is better)
Median
BM
pigauto
Median
BM
pigauto
3 levels
1.017
0.840
0.826 ★
–
0.648 ★
0.648
5 levels
0.994
0.809 ★
0.813
–
0.622 ★
0.619
7 levels
0.978
0.751 ★
0.752
–
0.663
0.664 ★
10 levels
–
–
–
–
–
–
Secondary sweep: signal strength (5 levels)
RMSE by phylogenetic signal at fixed 5 ordinal levels.
What the benchmark shows
BM on the integer-z scale is a strong baseline for ordinal traits. Rphylopars treats the z-scored integer codes as continuous, which works well when phylogenetic signal is moderate to high.
More ordinal levels expose finer-grained variation. With only 3 levels the discretisation is coarse and the baseline captures most of the structure. With 10 levels there is more information, but this run does not justify a general improvement claim.
Spearman rank correlation tracks RMSE improvements. Because ordinal imputation cares about rank preservation, Spearman ρ is the more interpretable metric for downstream use.
Reproducibility
Driver: script/bench_ordinal.R. Tree: ape::rtree(300). Traits: simulate_ordinal_traits(). Training: 500 epochs with early stopping. To reproduce: Rscript script/bench_ordinal.R, then Rscript script/make_bench_ordinal_html.R.