Count-trait benchmark: Poisson / NegBin

Tree: ape::rtree(300) · Traits: 3 count per scenario · Mean counts: 5 – 500 · Methods: mean · BM baseline (Rphylopars) · pigauto · Replicates: 5 · Missingness: 25% MCAR · Commit 794537121b · Report generated 2026-05-30 12:04 · Total wall: 63.4 min

Bottom line. For sparse counts (mean = 5) the BM baseline achieves RMSE 0.696 and pigauto achieves 0.699 on the log1p-z scale. Low counts are inherently noisy, limiting all methods.

For dense counts (mean = 500), pigauto moves RMSE from 0.486 to 0.498 (-2.5%). The log1p transform compresses the scale, but the observed GNN delta is scenario-dependent.

Primary sweep: performance by mean count (25% missingness)

Average across 3 traits and 5 replicates. ★ marks the best method per scenario.

Mean countRMSE (lower is better)MAE (lower is better)Pearson r (higher is better)
MeanBMpigautoMeanBMpigautoMeanBMpigauto
μ = 51.0290.696 0.6990.8300.557 0.5620.7240.724
μ = 200.9780.557 0.5980.7710.447 0.4790.819 0.810
μ = 1001.0450.529 0.5350.8440.414 0.4190.860 0.858
μ = 5001.0320.486 0.4980.8250.383 0.3910.884 0.882
RMSE by mean count 0.00 0.30 0.60 0.90 1.20 RMSE 1.029 0.696 0.699 μ = 5 0.978 0.557 0.598 μ = 20 1.045 0.529 0.535 μ = 100 1.032 0.486 0.498 μ = 500 Mean count Mean imputation BM baseline (Rphylopars) pigauto (BM + GNN)
MAE by mean count 0.00 0.24 0.49 0.73 0.97 MAE 0.830 0.557 0.562 μ = 5 0.771 0.447 0.479 μ = 20 0.844 0.414 0.419 μ = 100 0.825 0.383 0.391 μ = 500 Mean count Mean imputation BM baseline (Rphylopars) pigauto (BM + GNN)
Pearson r by mean count 0.62 0.70 0.78 0.86 0.93 Pearson r 0.724 0.724 μ = 5 0.819 0.810 μ = 20 0.860 0.858 μ = 100 0.884 0.882 μ = 500 Mean count Mean imputation BM baseline (Rphylopars) pigauto (BM + GNN)

Secondary sweep: Poisson vs Negative Binomial (mean = 20)

Overdispersed counts (NegBin) are harder for all methods.

RMSE: Poisson vs NegBin (mean = 20) 0.00 0.29 0.58 0.87 1.16 RMSE 1.007 0.586 0.601 Poisson 1.002 0.770 0.789 NegBin Distribution Mean imputation BM baseline (Rphylopars) pigauto (BM + GNN)

What the benchmark shows

Reproducibility

Driver: script/bench_count.R. Tree: ape::rtree(300). Traits: simulate_count_traits(). Training: 500 epochs with early stopping. To reproduce: Rscript script/bench_count.R, then Rscript script/make_bench_count_html.R.