Covariate effectiveness simulation

This benchmark simulates traits with varying levels of phylogenetic signal (λ) and environmental effect (β) to demonstrate when environmental covariates improve imputation accuracy. Covariates help most when phylogenetic signal is low and environmental effects are strong.

Setup

Key finding: Covariates reduce RMSE by up to 11% in the Low phylo, strong env (λ=0.1, β=1.5) scenario ( λ=0.1, β=1.5, 40% missing ). When phylogenetic signal is high and environmental effects are absent, covariates provide zero lift (ratio ≈ 1.0), confirming the gated safety.

RMSE by scenario, missingness, and method

ScenarioλβMiss %MethodRMSEPearson r
Low phylo, strong env (λ=0.1, β=1.5)0.11.520% pigauto2.71360.4208
Low phylo, strong env (λ=0.1, β=1.5)0.11.520% pigauto + covs2.50460.4964
Low phylo, strong env (λ=0.1, β=1.5)0.11.540% pigauto2.83700.3365
Low phylo, strong env (λ=0.1, β=1.5)0.11.540% pigauto + covs2.51280.4887
Low phylo, strong env (λ=0.1, β=1.5)0.11.560% pigauto2.88600.3274
Low phylo, strong env (λ=0.1, β=1.5)0.11.560% pigauto + covs2.58460.4782
Moderate phylo, strong env (λ=0.3, β=1.0)0.31.020% pigauto1.50490.2497
Moderate phylo, strong env (λ=0.3, β=1.0)0.31.020% pigauto + covs1.46350.3010
Moderate phylo, strong env (λ=0.3, β=1.0)0.31.040% pigauto1.52980.2268
Moderate phylo, strong env (λ=0.3, β=1.0)0.31.040% pigauto + covs1.47390.2782
Moderate phylo, strong env (λ=0.3, β=1.0)0.31.060% pigauto1.64840.1464
Moderate phylo, strong env (λ=0.3, β=1.0)0.31.060% pigauto + covs1.60040.2008
High phylo, moderate env (λ=0.7, β=0.5)0.70.520% pigauto1.48870.2104
High phylo, moderate env (λ=0.7, β=0.5)0.70.520% pigauto + covs1.43690.2773
High phylo, moderate env (λ=0.7, β=0.5)0.70.540% pigauto1.46570.2189
High phylo, moderate env (λ=0.7, β=0.5)0.70.540% pigauto + covs1.37570.2858
High phylo, moderate env (λ=0.7, β=0.5)0.70.560% pigauto1.48370.1329
High phylo, moderate env (λ=0.7, β=0.5)0.70.560% pigauto + covs1.43170.2012
High phylo, no env (λ=0.9, β=0)0.90.020% pigauto0.57160.5804
High phylo, no env (λ=0.9, β=0)0.90.020% pigauto + covs0.56950.5809
High phylo, no env (λ=0.9, β=0)0.90.040% pigauto0.61050.6280
High phylo, no env (λ=0.9, β=0)0.90.040% pigauto + covs0.60850.6284
High phylo, no env (λ=0.9, β=0)0.90.060% pigauto0.64760.5886
High phylo, no env (λ=0.9, β=0)0.90.060% pigauto + covs0.64590.5888

Covariate lift (RMSE ratio: with covariates / without)

Ratio < 1.0 means covariates improve imputation; ratio ≈ 1.0 means no effect.

ScenarioλβMiss %RMSE (no cov)RMSE (cov)RatioImprovement
Low phylo, strong env (λ=0.1, β=1.5)0.11.520% 2.71362.5046 0.923 7.7%
Low phylo, strong env (λ=0.1, β=1.5)0.11.540% 2.83702.5128 0.886 11.4%
Low phylo, strong env (λ=0.1, β=1.5)0.11.560% 2.88602.5846 0.896 10.4%
Moderate phylo, strong env (λ=0.3, β=1.0)0.31.020% 1.50491.4635 0.972 2.8%
Moderate phylo, strong env (λ=0.3, β=1.0)0.31.040% 1.52981.4739 0.963 3.7%
Moderate phylo, strong env (λ=0.3, β=1.0)0.31.060% 1.64841.6004 0.971 2.9%
High phylo, moderate env (λ=0.7, β=0.5)0.70.520% 1.48871.4369 0.965 3.5%
High phylo, moderate env (λ=0.7, β=0.5)0.70.540% 1.46571.3757 0.939 6.1%
High phylo, moderate env (λ=0.7, β=0.5)0.70.560% 1.48371.4317 0.965 3.5%
High phylo, no env (λ=0.9, β=0)0.90.020% 0.57160.5695 0.996 0.4%
High phylo, no env (λ=0.9, β=0)0.90.040% 0.61050.6085 0.997 0.3%
High phylo, no env (λ=0.9, β=0)0.90.060% 0.64760.6459 0.997 0.3%

Interpretation


Generated 2026-05-11 10:58 by script/make_bench_covariate_sim_html.R