Bottom line. At high phylogenetic signal (1.0), phylo label propagation achieves 69.6% accuracy and pigauto achieves 70.4%. The calibrated gate often keeps pigauto close to the baseline when the baseline is already strong.
At low signal (0.2), all methods struggle: baseline 58.0%, pigauto 51.2%. With weak phylogenetic structure there is limited information for any method to exploit.
Primary sweep: accuracy by phylogenetic signal (25% missingness)
Average across 4 traits and 5 replicates. ★ marks the best method per scenario.
Phylogenetic label propagation is a strong baseline for binary traits. At high signal in this sweep, the similarity-weighted average of neighbours is difficult to improve on. The GNN must earn its gate to add value.
Signal matters strongly in this simulation. At low signal (0.2) the phylogenetic structure is weak, limiting all phylogeny-based methods. At high signal (1.0), the neighbour baseline and pigauto both improve, but their ordering still varies by cell.
pigauto stays close to label propagation but can trail it. The table should be read by signal level and imbalance setting; the GNN contribution is not uniformly positive in this run.
Reproducibility
Driver: script/bench_binary.R. Tree: ape::rtree(300). Traits: simulate_binary_traits(). Training: 500 epochs with early stopping. To reproduce: Rscript script/bench_binary.R, then Rscript script/make_bench_binary_html.R.