Skip to contents

Simulates a three-dimensional dataset from the Nakagawa et al. (in prep) functional-biogeography model: $$y_{sit} = \alpha_t + x_s'\beta_t + r_{st} + u_{st} + e_{sit} + p_{it} + q_{it}$$

Usage

simulate_site_trait(
  n_sites = 50,
  n_species = 20,
  n_traits = 3,
  mean_species_per_site = 5,
  n_predictors = 2,
  alpha = NULL,
  beta = NULL,
  sigma2_eps = 0.5,
  Lambda_B = NULL,
  Lambda_W = NULL,
  S_B = NULL,
  S_W = NULL,
  sigma2_phy = NULL,
  sigma2_sp = NULL,
  Cphy = NULL,
  spatial_range = NULL,
  sigma2_spa = NULL,
  coords = NULL,
  seed = NULL
)

Arguments

n_sites

Integer; number of sites \(S\).

n_species

Integer; number of species \(I\) in the regional pool.

n_traits

Integer; number of traits \(T\).

mean_species_per_site

Average number of species observed per site (Poisson with this mean, truncated at \(n_{species}\)). Default 5.

n_predictors

Integer; number of site-level predictors. Default 2.

alpha

Optional length-n_traits vector of trait intercepts; random if NULL.

beta

Optional n_traits x n_predictors matrix of trait-specific slopes; random if NULL.

sigma2_eps

Residual variance (s_W only term in fixed-effects-only simulator). Default 0.5.

Lambda_B, Lambda_W

Optional n_traits x d_B / n_traits x d_W loading matrices for between-site / within-site reduced-rank components. Set to NULL (default) to omit.

S_B, S_W

Optional length-n_traits vectors of trait-specific specific variances at the global / local level.

sigma2_phy, sigma2_sp

Optional length-n_traits vectors of phylogenetic and non-phylogenetic species variance. Cphy is required if sigma2_phy is supplied.

Cphy

Optional n_species x n_species phylogenetic correlation matrix. If supplied, used to draw p_it.

spatial_range, sigma2_spa

Optional spatial range and per-trait variance for an exponential spatial residual r_st. If both supplied and coords is NULL, sites are placed uniformly in [0, 1]^2.

coords

Optional n_sites x 2 matrix of site coordinates (used when spatial_range is supplied).

seed

Optional RNG seed.

Value

A list with components:

data

Long-format data frame with one row per (site, species, trait) observation: columns site, species, trait, value, site_species, predictors env_1, …, env_n_predictors, plus lon and lat if coords were generated.

truth

Named list of true parameter values (alpha, beta, Lambda_B, Lambda_W, S_B, S_W, sigma2_phy, sigma2_sp, sigma2_spa, spatial_range, sigma2_eps).

Cphy

The phylogenetic correlation matrix used (or NULL).

coords

Site coordinates used (or NULL).

Details

This simulator is domain-specific: it produces a (site, species, trait) cube, which is the canonical functional-biogeography layout. The package as a whole works for any (unit, trait) stacked-trait GLLVM (unit may be site, individual, species, paper, ...); this particular simulator targets the site × species × trait special case used in the methods paper. For simpler simulations from a generic (unit, trait) design, build the data inline as in the morphometrics article.

Each component can be turned on or off via the corresponding variance / loading argument. The default settings produce a fixed-effects-only dataset suitable for Stage-1 regression tests.

Examples

set.seed(1)
sim <- simulate_site_trait(n_sites = 30, n_species = 8, n_traits = 3,
                           mean_species_per_site = 4)
head(sim$data)
#>   site species site_species   trait      value      env_1     env_2
#> 1    1       1          1_1 trait_1 -1.7358402 -0.3053884 0.7631757
#> 2    1       1          1_1 trait_2 -0.3452106 -0.3053884 0.7631757
#> 3    1       1          1_1 trait_3  0.2164905 -0.3053884 0.7631757
#> 4    1       2          1_2 trait_1 -1.1233488 -0.3053884 0.7631757
#> 5    1       2          1_2 trait_2 -0.5638725 -0.3053884 0.7631757
#> 6    1       2          1_2 trait_3  0.8311514 -0.3053884 0.7631757
sim$truth$alpha
#> [1] -0.6264538  0.1836433 -0.8356286