Dispatches to pigauto's phylogenetic baseline machinery and returns imputed latent-scale means and standard errors for every species.
Usage
fit_baseline(
data,
tree,
splits = NULL,
model = "BM",
graph = NULL,
multi_obs_aggregation = c("hard", "soft"),
em_iterations = 0L,
em_tol = 0.001,
em_offdiag = FALSE
)Arguments
- data
object of class
"pigauto_data".- tree
object of class
"phylo".- splits
list (output of
make_missing_splits) orNULL.- model
character. Evolutionary model:
"BM"(default) or"OU".- graph
optional list returned by
build_phylo_graph. When supplied,graph$D(cophenetic distances) is reused for label propagation andgraph$R_phy(phylogenetic correlation matrix) is reused for BM imputation, avoiding duplicate \(O(n^2)\) allocations. WhenNULL(default), both matrices are computed here.- multi_obs_aggregation
character. How to aggregate multiple observations per species before the Level-C (Rphylopars) baseline:
"hard"(default) thresholds binary proportions at 0.5 and uses argmax for categorical, matching Phase 10 behaviour."soft"preserves species-level proportions and dispatches the truncated-Gaussian soft E-step (estep_liability_binary_soft) so that intermediate class frequencies contribute fractional liability evidence. Only relevant for multi-obs data with binary or categorical traits when the Level-C joint baseline is active.- em_iterations
integer. Number of Phase 6 EM iterations for the threshold-joint baseline (binary + ordinal + OVR categorical). Default
0Ldisables the EM loop and preserves v0.9.1 output byte-for-byte. When>= 1, the BM rate \(\Sigma\) learned byRphylopars::phylopars()at iteration \(k\) is fed back as the per-trait prior SD at iteration \(k+1\), up toem_iterationstimes or untilem_tolconvergence.em_iterations = 1Lis a degenerate single-pass run and produces the same baseline output as0L;>= 2Lis needed for actual iteration. Only affects the threshold-joint path (continuous-only traits pass through the existing joint MVN path unchanged).- em_tol
numeric. Relative-Frobenius convergence tolerance for the Phase 6 / 7 EM loop. Early-stops when \(||\Sigma_k - \Sigma_{k-1}||_F / ||\Sigma_{k-1}||_F < \)
em_tol. Default1e-3.- em_offdiag
logical. Phase 7 opt-in: when
TRUEANDem_iterations >= 2L, each liability cell's prior at iteration \(k+1\) is the conditional-MVN \((\mu, sd)\) given the posterior liability of other traits at iteration \(k\), using the full off- diagonal entries of \(\Sigma\). Binary + ordinal only (OVR categorical stays on Phase 6 diagonal). DefaultFALSEpreserves Phase 6 behaviour.
Value
A list with:
- mu
Numeric matrix (n_species x p_latent), baseline means in latent scale.
- se
Numeric matrix (n_species x p_latent), standard errors.
Details
When splits is supplied the val and test cells are masked to
NA before fitting, so the baseline is evaluated under the same
conditions as fit_pigauto.
Continuous-family columns use Brownian-motion conditional MVN baselines on the phylogenetic correlation matrix, either independently or through the joint MVN path when the data and optional dependencies support it. Binary, ordinal, categorical, and zero-inflated gate columns use the appropriate label-propagation or threshold/liability baseline candidates, with per-column fallbacks when a joint path is not available.
Examples
if (FALSE) { # \dontrun{
data(avonet300, tree300, package = "pigauto")
traits <- avonet300; rownames(traits) <- traits$Species_Key
traits$Species_Key <- NULL
pd <- preprocess_traits(traits, tree300)
splits <- make_missing_splits(pd$X_scaled, trait_map = pd$trait_map)
bl <- fit_baseline(pd, tree300, splits)
} # }
