Multi-observation imputation benchmark

pigauto v0.6.0 — observation-level covariates · 200 species · 5 obs/species · 2 replicates · 44.0 min compute · Generated 2026-05-11 10:58

1. The problem

Comparative datasets increasingly contain multiple data points per species measured under different experimental or environmental conditions: critical thermal maximum (CT_max) at different acclimation temperatures, metabolic rate at different body temperatures, performance at different substrate concentrations. Missing data is ubiquitous in these datasets, and different species may be missing measurements at different condition levels.

The challenge: can imputation methods use observation-level covariates (the experimental condition under which each measurement was taken) to produce covariate-conditional predictions? A species measured at 20°C acclimation should receive a different CT_max imputation than the same species measured at 30°C. Standard phylogenetic imputation methods that operate at the species level cannot make this distinction.

Key question: Does supplying observation-level covariates to pigauto improve imputation accuracy when within-species variation is driven by a measurable condition variable?

2. Simulation design

The data-generating process simulates a thermal physiology scenario:


CTmax_ij = mu_i + beta * acclim_temp_j + epsilon_ij

where:
  mu_i        ~ phylogenetic BM (species-level intercept)
  acclim_temp ~ experimental condition (observation-level covariate)
  beta        = within-species slope (swept: 0, 0.5, 1.0, 1.5)
  epsilon_ij  ~ N(0, sigma^2) residual noise

Each species has multiple observations at different acclimation temperatures. The parameter beta controls the strength of the within-species covariate effect. When beta = 0, there is no within-species variation driven by the covariate; when beta > 0, the covariate contains information that should improve imputation.

Methods compared

Method	Description
Mean imputation	Column mean of observed values. Ignores phylogeny and covariates.
pigauto (no covariates)	Phylogenetic BM baseline + GNN correction. Uses the tree and cross-trait correlations but no observation-level covariates.
pigauto + obs-level covariates	Same architecture, but the acclimation temperature is supplied as an observation-level covariate. The refinement MLP can learn within-species adjustments.

3. Results

Observation-level RMSE

Average RMSE on held-out cells across replicates. Bold green marks the best method per scenario.

Scenario	Mean imputation	pigauto (no covariates)	pigauto + obs-level covariates
lambda=0.5 beta=0.0 miss=50%	1.123	0.891	0.890
lambda=0.9 beta=0.0 miss=50%	1.382	1.090	1.101
lambda=0.5 beta=0.5 miss=50%	2.927	2.967	2.226
lambda=0.9 beta=0.5 miss=50%	3.060	3.006	2.320
lambda=0.5 beta=1.0 miss=50%	5.107	6.214	4.003
lambda=0.9 beta=1.0 miss=50%	5.537	6.164	4.220
lambda=0.5 beta=0.0 miss=80%	1.413	1.081	1.072
lambda=0.9 beta=0.0 miss=80%	1.975	1.552	1.557
lambda=0.5 beta=0.5 miss=80%	2.925	3.044	2.292
lambda=0.9 beta=0.5 miss=80%	3.063	3.223	2.495
lambda=0.5 beta=1.0 miss=80%	5.228	5.524	3.984
lambda=0.9 beta=1.0 miss=80%	5.452	5.666	4.111

Pearson r (observed vs predicted)

Higher is better. Bold green marks the best method per scenario.

Scenario	Mean imputation	pigauto (no covariates)	pigauto + obs-level covariates
lambda=0.5 beta=0.0 miss=50%	0.3781	0.6754	0.6744
lambda=0.9 beta=0.0 miss=50%	0.3523	0.6846	0.6775
lambda=0.5 beta=0.5 miss=50%	0.1267	0.2903	0.6553
lambda=0.9 beta=0.5 miss=50%	0.1570	0.3471	0.6508
lambda=0.5 beta=1.0 miss=50%	0.0729	0.0247	0.6414
lambda=0.9 beta=1.0 miss=50%	0.0644	0.0814	0.6547
lambda=0.5 beta=0.0 miss=80%	0.2043	0.7041	0.7057
lambda=0.9 beta=0.0 miss=80%	0.2048	0.6496	0.6480
lambda=0.5 beta=0.5 miss=80%	0.0599	0.1836	0.6308
lambda=0.9 beta=0.5 miss=80%	0.0676	0.2254	0.5747
lambda=0.5 beta=1.0 miss=80%	-0.0051	0.1100	0.6522
lambda=0.9 beta=1.0 miss=80%	0.0448	0.1397	0.6808

Covariate lift

RMSE ratio of pigauto + covariates relative to pigauto (no covariates). Values < 1 indicate observation-level covariates help.

Scenario	RMSE (no cov)	RMSE (+ cov)	Ratio	Lift
lambda=0.5 beta=0.0 miss=50%	0.891	0.890	0.999	+0.1%
lambda=0.9 beta=0.0 miss=50%	1.090	1.101	1.010	-1.0%
lambda=0.5 beta=0.5 miss=50%	2.967	2.226	0.750	+25.0%
lambda=0.9 beta=0.5 miss=50%	3.006	2.320	0.772	+22.8%
lambda=0.5 beta=1.0 miss=50%	6.214	4.003	0.644	+35.6%
lambda=0.9 beta=1.0 miss=50%	6.164	4.220	0.685	+31.5%
lambda=0.5 beta=0.0 miss=80%	1.081	1.072	0.992	+0.8%
lambda=0.9 beta=0.0 miss=80%	1.552	1.557	1.003	-0.3%
lambda=0.5 beta=0.5 miss=80%	3.044	2.292	0.753	+24.7%
lambda=0.9 beta=0.5 miss=80%	3.223	2.495	0.774	+22.6%
lambda=0.5 beta=1.0 miss=80%	5.524	3.984	0.721	+27.9%
lambda=0.9 beta=1.0 miss=80%	5.666	4.111	0.726	+27.4%

4. Key findings

When beta > 0 (within-species covariate effect exists), pigauto with observation-level covariates usually improves over pigauto without covariates in this simulation. The lift grows with beta because stronger covariate effects provide more information for the model to exploit.

When beta = 0 (no within-species covariate effect), the two pigauto variants produce similar RMSE. The gated architecture can fall back when covariates are uninformative; treat the beta = 0 rows as the evidence for whether they changed accuracy in this run.

Species-level phylogenetic imputation is necessary but not sufficient. The BM baseline captures inter-species variation driven by shared evolutionary history. But when within-species variation is structured by an experimental condition, species-level imputation leaves that structure unexploited.
Observation-level covariates fill the gap. By conditioning on the experimental condition (acclimation temperature in this simulation), pigauto can produce different imputed values for different observations of the same species, matching the true data-generating process.
Safety when covariates are uninformative. The gated architecture can fall back to the phylogenetic baseline when the covariate has no predictive value in this simulation (beta = 0). Treat covariates as a validation-checked addition, not an automatic improvement.

5. Architecture: observation-level refinement

Standard phylogenetic imputation operates at the species level: one prediction per species per trait. To handle multiple observations per species, pigauto uses a two-stage architecture:

Species-level message passing. The GNN aggregates observations within each species (scatter_mean), performs phylogenetic message passing on the species graph, then broadcasts the species-level representation back to observations (index_select). This captures inter-species structure from the phylogenetic tree.
Observation-level refinement MLP. A small feedforward network takes the species-level representation and concatenates it with the observation-level covariates. This allows the model to learn within-species adjustments: the same species gets different predictions at different covariate values.


# Schematic of the multi-obs + covariate pipeline
#
#   observations          species level          observations
#   (n_obs x p)           (n_species x d)        (n_obs x p)
#       |                      |                      |
#   scatter_mean  --->  GNN message passing  --->  broadcast
#                                                     |
#                                              concat obs covariates
#                                                     |
#                                              refinement MLP
#                                                     |
#                                                  delta_obs

The final prediction is still the gated blend:


pred = (1 - r_cal) * baseline + r_cal * delta_obs

where baseline is the phylogenetic BM prediction (species-level, broadcast to observations) and delta_obs is the observation-level GNN output that incorporates both phylogenetic structure and covariate information.

6. Reproducibility

Driver: script/bench_multi_obs.R. Tree: 200 species. Observations per species: 5. Training: 200 epochs. To reproduce:


Rscript script/bench_multi_obs.R
Rscript script/make_bench_multi_obs_html.R

References

Goolsby, E.W., Bruggeman, J. & Ané, C. (2017). Rphylopars: fast multivariate phylogenetic comparative methods for missing data and within-species variation. Methods in Ecology and Evolution, 8, 22–27.
Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley.
Nakagawa, S. & Freckleton, R.P. (2008). Missing inaction: the dangers of ignoring missing data. Trends in Ecology & Evolution, 23, 592–596.

Generated 2026-05-11 10:58 by script/make_bench_multi_obs_html.R from pre-computed results in script/bench_multi_obs.rds.