Count-trait benchmark: Poisson / NegBin

Tree: ape::rtree(300) · Traits: 3 count per scenario · Mean counts: 5 – 500 · Methods: mean · BM baseline (Rphylopars) · pigauto · Replicates: 5 · Missingness: 25% MCAR · Commit c48a42ce6a · Run on 2026-05-11 10:56 · Total wall: 12.4 min

Bottom line. For sparse counts (mean = 5) the BM baseline achieves RMSE 0.699 and pigauto achieves 0.730 on the log1p-z scale. Low counts are inherently noisy, limiting all methods.

For dense counts (mean = 500), pigauto moves RMSE from 0.484 to 0.485 (-0.2%). The log1p transform compresses the scale, but the observed GNN lift is scenario-dependent.

Primary sweep: performance by mean count (25% missingness)

Average across 3 traits and 5 replicates. ★ marks the best method per scenario.

Mean countRMSE (lower is better)MAE (lower is better)Pearson r (higher is better)
MeanBMpigautoMeanBMpigautoMeanBMpigauto
μ = 51.0290.699 0.7300.8300.558 0.5820.723 0.716
μ = 200.9780.559 0.5860.7710.447 0.4700.8180.820
μ = 1001.0450.528 0.5340.8440.413 0.4200.8600.860
μ = 5001.0320.484 0.4850.8250.380 0.3810.886 0.885
RMSE by mean count 0.00 0.30 0.60 0.90 1.20 RMSE 1.029 0.699 0.730 μ = 5 0.978 0.559 0.586 μ = 20 1.045 0.528 0.534 μ = 100 1.032 0.484 0.485 μ = 500 Mean count Mean imputation BM baseline (Rphylopars) pigauto (BM + GNN)
MAE by mean count 0.00 0.24 0.49 0.73 0.97 MAE 0.830 0.558 0.582 μ = 5 0.771 0.447 0.470 μ = 20 0.844 0.413 0.420 μ = 100 0.825 0.380 0.381 μ = 500 Mean count Mean imputation BM baseline (Rphylopars) pigauto (BM + GNN)
Pearson r by mean count 0.62 0.70 0.78 0.86 0.94 Pearson r 0.723 0.716 μ = 5 0.818 0.820 μ = 20 0.860 0.860 μ = 100 0.886 0.885 μ = 500 Mean count Mean imputation BM baseline (Rphylopars) pigauto (BM + GNN)

Secondary sweep: Poisson vs Negative Binomial (mean = 20)

Overdispersed counts (NegBin) are harder for all methods.

RMSE: Poisson vs NegBin (mean = 20) 0.00 0.29 0.58 0.87 1.16 RMSE 1.007 0.585 0.586 Poisson 1.002 0.777 0.770 NegBin Distribution Mean imputation BM baseline (Rphylopars) pigauto (BM + GNN)

What the benchmark shows

Reproducibility

Driver: script/bench_count.R. Tree: ape::rtree(300). Traits: simulate_count_traits(). Training: 500 epochs with early stopping. To reproduce: Rscript script/bench_count.R, then Rscript script/make_bench_count_html.R.