Generates trait data under various evolutionary models, introduces missing data, fits both the Brownian motion baseline and the full pigauto GNN, and compares performance. This is the recommended way to assess pigauto on data with known properties before applying it to real data.
Usage
simulate_benchmark(
n_species = 100L,
n_traits = 4L,
scenarios = c("BM", "OU", "regime_shift", "nonlinear", "mixed"),
missing_frac = 0.25,
n_reps = 3L,
epochs = 500L,
verbose = TRUE,
...
)Arguments
- n_species
integer. Number of tips in the simulated tree (default 100).
- n_traits
integer. Number of continuous traits (default 4). Ignored for
scenario = "mixed", which generates a fixed trait set.- scenarios
character vector. Subset of
c("BM", "OU", "regime_shift", "nonlinear", "mixed"). Default runs all.- missing_frac
numeric. Fraction of observed cells held out (default 0.25).
- n_reps
integer. Number of replicate trees per scenario (default 3).
- epochs
integer. Maximum GNN training epochs (default 500).
- verbose
logical. Print progress (default
TRUE).- ...
additional arguments passed to
fit_pigauto.
Value
An object of class "pigauto_benchmark" with:
- results
data.frame with columns:
scenario,rep,method,trait,type,metric,value,n_test.- summary
data.frame averaged across replicates.
- scenarios
character vector of scenarios run.
- n_reps
integer.
- n_species
integer.
Details
Available scenarios:
"BM"Pure Brownian motion – the baseline is exact, so the GNN should tie or slightly improve via inter-trait correlations.
"OU"Ornstein-Uhlenbeck – stabilising selection constrains variation. BM over-estimates evolutionary variance.
"regime_shift"Two-regime BM – clade-specific optima create bimodal distributions that BM cannot capture.
"nonlinear"Non-linear inter-trait relationships – the GNN's multi-layer message passing can capture quadratic and interaction effects that BM's linear covariance misses.
"mixed"Mixed trait types: 2 continuous + 1 binary + 1 categorical (3 levels). Tests the full type pipeline.
Examples
if (FALSE) { # \dontrun{
bench <- simulate_benchmark(n_species = 50, epochs = 200, n_reps = 2)
bench$summary
plot(bench)
} # }
