Zero-inflated count benchmark: zero fraction sweep

Tree: ape::rtree(300) · Traits: 3 zero-inflated count per scenario · Zero fraction: 0.2 – 0.8 · Methods: mean · LP + BM baseline · pigauto · Replicates: 5 · Missingness: 25% MCAR · Commit 794537121b · Report generated 2026-05-30 12:04 · Total wall: 93.8 min

Bottom line. At 20% zero inflation, the baseline achieves RMSE 38.167 and pigauto achieves 39.402. With few structural zeros the count component dominates and BM on the log1p-z scale performs well.

At 80% zero inflation, RMSE rises to 35.991 (baseline) vs 36.040 (pigauto), and zero-class accuracy is 56.4% vs 52.4%. Heavy zero inflation creates a two-component mixture that challenges all methods; read the GNN contribution from the RMSE and zero-accuracy columns together.

Primary sweep: performance by zero fraction (25% missingness)

Average across 3 traits and 5 replicates. ★ marks the best method per scenario.

Zero fractionRMSE (lower is better)Zero accuracy (higher is better)
MeanBaselinepigautoMeanBaselinepigauto
20% zeros58.05938.167 39.4021.000 0.9370.936
40% zeros34.49425.787 26.3011.000 0.8290.828
60% zeros57.59544.472 45.2001.000 0.8120.803
80% zeros41.88635.991 36.0401.000 0.5640.524
RMSE by zero fraction 0.00 16.69 33.38 50.08 66.77 RMSE 58.059 38.167 39.402 20% zeros 34.494 25.787 26.301 40% zeros 57.595 44.472 45.200 60% zeros 41.886 35.991 36.040 80% zeros Zero fraction Mean imputation LP + BM baseline pigauto (LP + BM + GNN)
Zero-class accuracy by zero fraction 0.42 0.57 0.71 0.86 1.00 Zero accuracy 1.000 0.937 0.936 20% zeros 1.000 0.829 0.828 40% zeros 1.000 0.812 0.803 60% zeros 1.000 0.564 0.524 80% zeros Zero fraction Mean imputation LP + BM baseline pigauto (LP + BM + GNN)

Secondary sweep: non-zero mean (zero frac = 0.5)

How the mean of non-zero counts affects imputation quality at fixed 50% zero inflation.

RMSE by non-zero mean (zero frac = 0.5) 0.00 137.63 275.26 412.90 550.53 RMSE 17.347 10.571 11.012 μ(nz) = 5 32.567 23.784 23.546 μ(nz) = 20 478.719 259.481 257.740 μ(nz) = 100 Mean of non-zero counts Mean imputation LP + BM baseline pigauto (LP + BM + GNN)

What the benchmark shows

Reproducibility

Driver: script/bench_zi_count.R. Tree: ape::rtree(300). Traits: simulate_zi_count_traits(). Training: 500 epochs with early stopping. To reproduce: Rscript script/bench_zi_count.R, then Rscript script/make_bench_zi_count_html.R.