Count-trait benchmark: Poisson / NegBin

Tree: ape::rtree(300) · Traits: 3 count per scenario · Mean counts: 5 – 500 · Methods: mean · BM baseline (Rphylopars) · pigauto · Replicates: 5 · Missingness: 25% MCAR · Commit c48a42ce6a · Run on 2026-05-11 10:56 · Total wall: 12.4 min

Bottom line. For sparse counts (mean = 5) the BM baseline achieves RMSE 0.699 and pigauto achieves 0.730 on the log1p-z scale. Low counts are inherently noisy, limiting all methods.

For dense counts (mean = 500), pigauto moves RMSE from 0.484 to 0.485 (-0.2%). The log1p transform compresses the scale, but the observed GNN lift is scenario-dependent.

Primary sweep: performance by mean count (25% missingness)

Average across 3 traits and 5 replicates. ★ marks the best method per scenario.

Mean count	RMSE (lower is better)			MAE (lower is better)			Pearson r (higher is better)
Mean count	Mean	BM	pigauto	Mean	BM	pigauto	Mean	BM	pigauto
μ = 5	1.029	0.699 ★	0.730	0.830	0.558 ★	0.582	–	0.723 ★	0.716
μ = 20	0.978	0.559 ★	0.586	0.771	0.447 ★	0.470	–	0.818	0.820 ★
μ = 100	1.045	0.528 ★	0.534	0.844	0.413 ★	0.420	–	0.860	0.860 ★
μ = 500	1.032	0.484 ★	0.485	0.825	0.380 ★	0.381	–	0.886 ★	0.885

Secondary sweep: Poisson vs Negative Binomial (mean = 20)

Overdispersed counts (NegBin) are harder for all methods.

What the benchmark shows

The log1p-z pipeline handles count data well. Counts are transformed via log1p then z-scored before entering the latent matrix. This brings sparse and dense counts onto a common scale where BM can operate.
Sparse counts (mean = 5) are inherently noisy. All methods struggle because the signal-to-noise ratio is low in log-space for small integers. The GNN cannot extract structure that is not there.
Dense counts (mean = 100+) are not automatically easier for the GNN. The log1p transform gives a smoother latent scale, but the table shows modest and mixed deltas against BM.
Negative-binomial overdispersion increases error for all methods. The extra variance makes all predictions less precise, and the method ordering should be read from the table rather than assumed.

Reproducibility

Driver: script/bench_count.R. Tree: ape::rtree(300). Traits: simulate_count_traits(). Training: 500 epochs with early stopping. To reproduce: Rscript script/bench_count.R, then Rscript script/make_bench_count_html.R.

Source: script/bench_count.R · Results: script/bench_count.rds · Report: pkgdown/assets/dev/bench_count.html