Skip to contents

Pulls occurrence records from the Global Biodiversity Information Facility (GBIF) for each species in species, aggregates them to a median lat / lon centroid, and returns a data.frame ready to be passed to impute via its covariates argument.

Usage

pull_gbif_centroids(
  species,
  cache_dir = NULL,
  occurrence_limit = 500L,
  sleep_ms = 100L,
  verbose = TRUE,
  refresh_cache = FALSE,
  store_points = FALSE
)

Arguments

species

character vector of species names.

cache_dir

directory to cache per-species RDS files. NULL (default) disables caching — not recommended for production use.

occurrence_limit

integer, maximum number of occurrences to fetch per species (default 500; paginated if > 300).

sleep_ms

integer, polite delay between API calls in milliseconds (default 100).

verbose

logical, print progress every 50 species.

refresh_cache

logical, force re-fetch even when cache exists.

store_points

logical. When TRUE, persists the raw filtered lat/lon occurrence points in each species' cache RDS under the points field. Used by pull_worldclim_per_species for per-occurrence bioclim extraction. Default FALSE preserves the pre-v0.9.1.9006 cache format.

Value

A data.frame with columns species, centroid_lat, centroid_lon, n_occurrences. Rownames are set to species. NA centroids are returned for species with no GBIF hits; the caller should decide how to handle them (typical: drop or impute).

Details

Caching is strongly recommended — GBIF rate-limits anonymous calls and the per-species fetch is expensive. With cache_dir set, each species gets one RDS file; subsequent calls skip the API.

For each species: resolve taxon via name_backbone, fetch up to occurrence_limit records via occ_search (paginated at 300 per GBIF call), filter out records with hasGeospatialIssues = TRUE and basisOfRecord in c("FOSSIL_SPECIMEN", "LIVING_SPECIMEN"), drop out-of-range coordinates, then take the median latitude and longitude as the species centroid.

Species with zero post-filter records receive NA centroids; their rows are still included in the returned data.frame so it aligns with the input species list.

Requires the optional rgbif package (in DESCRIPTION Suggests). If rgbif is not installed the function errors with an installation message.

See also

impute (pass the return value as covariates).

Examples

if (FALSE) { # \dontrun{
# Plants ecology example: pull centroids for a species list.
sp <- c("Quercus alba", "Pinus taeda", "Acer saccharum")
cov <- pull_gbif_centroids(sp, cache_dir = "script/data-cache/gbif")
# Use as covariates (drop the bookkeeping cols)
cov_num <- cov[, c("centroid_lat", "centroid_lon"), drop = FALSE]
# Then: impute(traits, tree, covariates = cov_num)
} # }