
Simulate a functional-biogeography GLLVM dataset (sites × species × traits)
Source:R/simulate-site-trait.R
simulate_site_trait.RdSimulates a three-dimensional dataset from the Nakagawa et al. (in prep) functional-biogeography model: $$y_{sit} = \alpha_t + x_s'\beta_t + r_{st} + u_{st} + e_{sit} + p_{it} + q_{it}$$
Usage
simulate_site_trait(
n_sites = 50,
n_species = 20,
n_traits = 3,
mean_species_per_site = 5,
n_predictors = 2,
alpha = NULL,
beta = NULL,
sigma2_eps = 0.5,
Lambda_B = NULL,
Lambda_W = NULL,
S_B = NULL,
S_W = NULL,
sigma2_phy = NULL,
sigma2_sp = NULL,
Cphy = NULL,
spatial_range = NULL,
sigma2_spa = NULL,
coords = NULL,
seed = NULL
)Arguments
- n_sites
Integer; number of sites \(S\).
- n_species
Integer; number of species \(I\) in the regional pool.
- n_traits
Integer; number of traits \(T\).
- mean_species_per_site
Average number of species observed per site (Poisson with this mean, truncated at \(n_{species}\)). Default 5.
- n_predictors
Integer; number of site-level predictors. Default 2.
- alpha
Optional length-
n_traitsvector of trait intercepts; random ifNULL.- beta
Optional
n_traitsxn_predictorsmatrix of trait-specific slopes; random ifNULL.- sigma2_eps
Residual variance (
s_Wonly term in fixed-effects-only simulator). Default 0.5.- Lambda_B, Lambda_W
Optional
n_traitsxd_B/n_traitsxd_Wloading matrices for between-site / within-site reduced-rank components. Set toNULL(default) to omit.- S_B, S_W
Optional length-
n_traitsvectors of trait-specific specific variances at the global / local level.- sigma2_phy, sigma2_sp
Optional length-
n_traitsvectors of phylogenetic and non-phylogenetic species variance.Cphyis required ifsigma2_phyis supplied.- Cphy
Optional
n_speciesxn_speciesphylogenetic correlation matrix. If supplied, used to drawp_it.- spatial_range, sigma2_spa
Optional spatial range and per-trait variance for an exponential spatial residual
r_st. If both supplied andcoordsisNULL, sites are placed uniformly in[0, 1]^2.- coords
Optional
n_sitesx 2 matrix of site coordinates (used whenspatial_rangeis supplied).- seed
Optional RNG seed.
Value
A list with components:
dataLong-format data frame with one row per (site, species, trait) observation: columns
site,species,trait,value,site_species, predictorsenv_1, …,env_n_predictors, pluslonandlatif coords were generated.truthNamed list of true parameter values (alpha, beta, Lambda_B, Lambda_W, S_B, S_W, sigma2_phy, sigma2_sp, sigma2_spa, spatial_range, sigma2_eps).
CphyThe phylogenetic correlation matrix used (or
NULL).coordsSite coordinates used (or
NULL).
Details
This simulator is domain-specific: it produces a (site, species, trait) cube, which is the canonical functional-biogeography
layout. The package as a whole works for any (unit, trait)
stacked-trait GLLVM (unit may be site, individual, species,
paper, ...); this particular simulator targets the site × species
× trait special case used in the methods paper. For simpler
simulations from a generic (unit, trait) design, build the data
inline as in the morphometrics article.
Each component can be turned on or off via the corresponding variance / loading argument. The default settings produce a fixed-effects-only dataset suitable for Stage-1 regression tests.
Examples
set.seed(1)
sim <- simulate_site_trait(n_sites = 30, n_species = 8, n_traits = 3,
mean_species_per_site = 4)
head(sim$data)
#> site species site_species trait value env_1 env_2
#> 1 1 1 1_1 trait_1 -1.7358402 -0.3053884 0.7631757
#> 2 1 1 1_1 trait_2 -0.3452106 -0.3053884 0.7631757
#> 3 1 1 1_1 trait_3 0.2164905 -0.3053884 0.7631757
#> 4 1 2 1_2 trait_1 -1.1233488 -0.3053884 0.7631757
#> 5 1 2 1_2 trait_2 -0.5638725 -0.3053884 0.7631757
#> 6 1 2 1_2 trait_3 0.8311514 -0.3053884 0.7631757
sim$truth$alpha
#> [1] -0.6264538 0.1836433 -0.8356286