Binary-trait benchmark: phylogenetic signal sweep

Tree: ape::rtree(300) · Traits: 4 binary per scenario · Signal: 0.2 – 1.0 · Methods: mode · phylo label propagation · pigauto · Replicates: 5 · Missingness: 25% MCAR · Commit 1ac34b11a9 · Run on 2026-05-11 10:56 · Total wall: 15.2 min

Bottom line. At high phylogenetic signal (1.0), phylo label propagation achieves 69.6% accuracy and pigauto achieves 66.8%. The calibrated gate often keeps pigauto close to the baseline when the baseline is already strong.

At low signal (0.2), all methods struggle: baseline 58.0%, pigauto 57.9%. With weak phylogenetic structure there is limited information for any method to exploit.

Primary sweep: accuracy by phylogenetic signal (25% missingness)

Average across 4 traits and 5 replicates. ★ marks the best method per scenario.

SignalAccuracy (higher is better)Brier score (lower is better)
ModeLPpigautoModeLPpigauto
Signal = 0.20.4980.580 0.5790.2900.2430.243
Signal = 0.40.5000.637 0.6140.2850.233 0.238
Signal = 0.60.5060.651 0.6410.2820.227 0.229
Signal = 0.80.5110.675 0.6670.2810.220 0.220
Signal = 1.00.5030.696 0.6680.2890.210 0.216
Accuracy by phylogenetic signal 0.40 0.48 0.57 0.66 0.75 Accuracy 0.498 0.580 0.579 Signal = 0.2 0.500 0.637 0.614 Signal = 0.4 0.506 0.651 0.641 Signal = 0.6 0.511 0.675 0.667 Signal = 0.8 0.503 0.696 0.668 Signal = 1.0 Phylogenetic signal Mode imputation Phylo label propagation pigauto (LP + GNN)
Brier score by phylogenetic signal 0.00 0.08 0.17 0.25 0.33 Brier score 0.290 0.243 0.243 Signal = 0.2 0.285 0.233 0.238 Signal = 0.4 0.282 0.227 0.229 Signal = 0.6 0.281 0.220 0.220 Signal = 0.8 0.289 0.210 0.216 Signal = 1.0 Phylogenetic signal Mode imputation Phylo label propagation pigauto (LP + GNN)

Secondary sweep: class imbalance (signal = 0.6)

Threshold quantile controls class balance: 0.5 = balanced, 0.9 = rare positive class.

Accuracy by class imbalance (signal = 0.6) 0.00 0.24 0.48 0.71 0.95 Accuracy 0.505 0.648 0.610 Imbalance q = 0.5 0.277 0.743 0.741 Imbalance q = 0.7 0.100 0.900 0.900 Imbalance q = 0.9 Scenario Mode imputation Phylo label propagation pigauto (LP + GNN)

What the benchmark shows

Reproducibility

Driver: script/bench_binary.R. Tree: ape::rtree(300). Traits: simulate_binary_traits(). Training: 500 epochs with early stopping. To reproduce: Rscript script/bench_binary.R, then Rscript script/make_bench_binary_html.R.