Skip to contents

Generates trait data under various evolutionary models, introduces missing data, fits both the Brownian motion baseline and the full pigauto GNN, and compares performance. This is the recommended way to assess pigauto on data with known properties before applying it to real data.

Usage

simulate_benchmark(
  n_species = 100L,
  n_traits = 4L,
  scenarios = c("BM", "OU", "regime_shift", "nonlinear", "mixed"),
  missing_frac = 0.25,
  n_reps = 3L,
  epochs = 500L,
  verbose = TRUE,
  ...
)

Arguments

n_species

integer. Number of tips in the simulated tree (default 100).

n_traits

integer. Number of continuous traits (default 4). Ignored for scenario = "mixed", which generates a fixed trait set.

scenarios

character vector. Subset of c("BM", "OU", "regime_shift", "nonlinear", "mixed"). Default runs all.

missing_frac

numeric. Fraction of observed cells held out (default 0.25).

n_reps

integer. Number of replicate trees per scenario (default 3).

epochs

integer. Maximum GNN training epochs (default 500).

verbose

logical. Print progress (default TRUE).

...

additional arguments passed to fit_pigauto.

Value

An object of class "pigauto_benchmark" with:

results

data.frame with columns: scenario, rep, method, trait, type, metric, value, n_test.

summary

data.frame averaged across replicates.

scenarios

character vector of scenarios run.

n_reps

integer.

n_species

integer.

Details

Available scenarios:

"BM"

Pure Brownian motion – the baseline is exact, so the GNN should tie or slightly improve via inter-trait correlations.

"OU"

Ornstein-Uhlenbeck – stabilising selection constrains variation. BM over-estimates evolutionary variance.

"regime_shift"

Two-regime BM – clade-specific optima create bimodal distributions that BM cannot capture.

"nonlinear"

Non-linear inter-trait relationships – the GNN's multi-layer message passing can capture quadratic and interaction effects that BM's linear covariance misses.

"mixed"

Mixed trait types: 2 continuous + 1 binary + 1 categorical (3 levels). Tests the full type pipeline.

Examples

if (FALSE) { # \dontrun{
bench <- simulate_benchmark(n_species = 50, epochs = 200, n_reps = 2)
bench$summary
plot(bench)
} # }