Bottom line. With 3 ordinal levels the BM baseline achieves RMSE 0.768 and pigauto achieves 0.765. The coarse discretisation limits the scope for GNN improvement.
With more ordinal levels there is finer-grained variation; compare pigauto against BM scenario by scenario.
Primary sweep: performance by number of ordinal levels (25% missingness)
Average across 3 traits and 5 replicates. ★ marks the best method per scenario.
Levels
RMSE (lower is better)
Spearman ρ (higher is better)
Median
BM
pigauto
Median
BM
pigauto
3 levels
1.017
0.768
0.765 ★
–
0.675 ★
0.675
5 levels
0.994
0.782 ★
0.787
–
0.629 ★
0.623
7 levels
0.978
0.722
0.720 ★
–
0.684 ★
0.684
10 levels
–
–
–
–
–
–
Secondary sweep: signal strength (5 levels)
RMSE by phylogenetic signal at fixed 5 ordinal levels.
What the benchmark shows
BM on the integer-z scale is a strong baseline for ordinal traits in this setup. Rphylopars treats the z-scored integer codes as continuous, which works well in these simulated cells.
More ordinal levels expose finer-grained variation. With only 3 levels the discretisation is coarse and the baseline captures most of the structure. With 10 levels there is more information, but this run does not justify a general improvement claim.
Spearman rank correlation tracks RMSE improvements. Because ordinal imputation cares about rank preservation, Spearman ρ is the more interpretable metric for downstream use.
Reproducibility
Driver: script/bench_ordinal.R. Tree: ape::rtree(300). Traits: simulate_ordinal_traits(). Training: 500 epochs with early stopping. To reproduce: Rscript script/bench_ordinal.R, then Rscript script/make_bench_ordinal_html.R.