Tree: ape::rtree(300) ·
Traits: 3 count per scenario ·
Mean counts: 5 – 500 ·
Methods: mean · BM baseline (Rphylopars) · pigauto ·
Replicates: 5 ·
Missingness: 25% MCAR ·
Commit c48a42ce6a ·
Run on 2026-05-11 10:56 ·
Total wall: 12.4 min
Bottom line. For sparse counts (mean = 5) the BM baseline achieves RMSE 0.699 and pigauto achieves 0.730 on the log1p-z scale. Low counts are inherently noisy, limiting all methods.
For dense counts (mean = 500), pigauto moves RMSE from 0.484 to 0.485 (-0.2%). The log1p transform compresses the scale, but the observed GNN lift is scenario-dependent.
Primary sweep: performance by mean count (25% missingness)
Average across 3 traits and 5 replicates. ★ marks the best method per scenario.
Mean count
RMSE (lower is better)
MAE (lower is better)
Pearson r (higher is better)
Mean
BM
pigauto
Mean
BM
pigauto
Mean
BM
pigauto
μ = 5
1.029
0.699 ★
0.730
0.830
0.558 ★
0.582
–
0.723 ★
0.716
μ = 20
0.978
0.559 ★
0.586
0.771
0.447 ★
0.470
–
0.818
0.820 ★
μ = 100
1.045
0.528 ★
0.534
0.844
0.413 ★
0.420
–
0.860
0.860 ★
μ = 500
1.032
0.484 ★
0.485
0.825
0.380 ★
0.381
–
0.886 ★
0.885
Secondary sweep: Poisson vs Negative Binomial (mean = 20)
Overdispersed counts (NegBin) are harder for all methods.
What the benchmark shows
The log1p-z pipeline handles count data well. Counts are transformed via log1p then z-scored before entering the latent matrix. This brings sparse and dense counts onto a common scale where BM can operate.
Sparse counts (mean = 5) are inherently noisy. All methods struggle because the signal-to-noise ratio is low in log-space for small integers. The GNN cannot extract structure that is not there.
Dense counts (mean = 100+) are not automatically easier for the GNN. The log1p transform gives a smoother latent scale, but the table shows modest and mixed deltas against BM.
Negative-binomial overdispersion increases error for all methods. The extra variance makes all predictions less precise, and the method ordering should be read from the table rather than assumed.
Reproducibility
Driver: script/bench_count.R. Tree: ape::rtree(300). Traits: simulate_count_traits(). Training: 500 epochs with early stopping. To reproduce: Rscript script/bench_count.R, then Rscript script/make_bench_count_html.R.