
Split cells into train/val/test for imputation evaluation
Source:R/mask_missing.R
make_missing_splits.RdRandomly designates a fraction of cells as "missing" and splits them into
validation and test sets. When a trait_map is supplied, masking
operates at the original trait level – all latent columns belonging
to one trait are held out together (important for categorical traits).
Arguments
- X
numeric matrix (species x latent columns from
preprocess_traits). Used only for dimensions.- missing_frac
numeric. Fraction of all (species, trait) cells to designate as missing (default
0.25).- val_frac
numeric. Fraction of missing cells for validation (default
0.25); the rest become the test set.- seed
integer. Random seed for reproducibility (default
555).- trait_map
list of trait descriptors (from
pigauto_data). IfNULL, masking is applied per latent column (v0.1 behaviour).- mechanism
character. Missingness mechanism:
"MCAR"(default, uniform random),"MAR_trait"(trait-dependent),"MAR_phylo"(clade-structured), or"MNAR"(value-dependent).- mechanism_args
named list of mechanism-specific parameters:
- For
"MAR_trait": driver_col(integer, column index inXthat drives missingness; default 1),beta(numeric, severity; default 2.0).- For
"MAR_phylo": n_clades(integer, number of high-missingness clades; default 2),p_clade(numeric, within-clade missingness probability; default 0.7),p_base(numeric, background missingness probability; default 0.1).- For
"MNAR": beta(numeric, severity; default 2.0).
- For
- tree
object of class
"phylo". Required formechanism = "MAR_phylo", ignored otherwise.
Value
A list with:
- val_idx
Integer vector of linear indices (latent space).
- test_idx
Integer vector of linear indices (latent space).
- val_idx_trait
Integer vector in original-trait space (if
trait_mapsupplied).- test_idx_trait
Integer vector in original-trait space (if
trait_mapsupplied).- n
Number of species (rows).
- p
Number of latent columns.
- n_traits
Number of original traits.
- mask
Logical matrix (n x p_latent).
TRUE= observed.- mechanism
Character string of the mechanism used.