Pulls occurrence records from the Global Biodiversity Information
Facility (GBIF) for each species in species, aggregates them
to a median lat / lon centroid, and returns a data.frame ready to
be passed to impute via its covariates argument.
Usage
pull_gbif_centroids(
species,
cache_dir = NULL,
occurrence_limit = 500L,
sleep_ms = 100L,
verbose = TRUE,
refresh_cache = FALSE,
store_points = FALSE
)Arguments
- species
character vector of species names.
- cache_dir
directory to cache per-species RDS files.
NULL(default) disables caching — not recommended for production use.- occurrence_limit
integer, maximum number of occurrences to fetch per species (default 500; paginated if > 300).
- sleep_ms
integer, polite delay between API calls in milliseconds (default 100).
- verbose
logical, print progress every 50 species.
- refresh_cache
logical, force re-fetch even when cache exists.
- store_points
logical. When
TRUE, persists the raw filtered lat/lon occurrence points in each species' cache RDS under thepointsfield. Used bypull_worldclim_per_speciesfor per-occurrence bioclim extraction. DefaultFALSEpreserves the pre-v0.9.1.9006 cache format.
Value
A data.frame with columns species, centroid_lat,
centroid_lon, n_occurrences. Rownames are set to
species. NA centroids are returned for species with no
GBIF hits; the caller should decide how to handle them (typical:
drop or impute).
Details
Caching is strongly recommended — GBIF rate-limits anonymous calls
and the per-species fetch is expensive. With cache_dir set,
each species gets one RDS file; subsequent calls skip the API.
For each species: resolve taxon via
name_backbone, fetch up to
occurrence_limit records via occ_search
(paginated at 300 per GBIF call), filter out records with
hasGeospatialIssues = TRUE and basisOfRecord in
c("FOSSIL_SPECIMEN", "LIVING_SPECIMEN"), drop
out-of-range coordinates, then take the median latitude and
longitude as the species centroid.
Species with zero post-filter records receive NA centroids;
their rows are still included in the returned data.frame so it
aligns with the input species list.
Requires the optional rgbif package (in DESCRIPTION
Suggests). If rgbif is not installed the function
errors with an installation message.
See also
impute (pass the return value as
covariates).
Examples
if (FALSE) { # \dontrun{
# Plants ecology example: pull centroids for a species list.
sp <- c("Quercus alba", "Pinus taeda", "Acer saccharum")
cov <- pull_gbif_centroids(sp, cache_dir = "script/data-cache/gbif")
# Use as covariates (drop the bookkeeping cols)
cov_num <- cov[, c("centroid_lat", "centroid_lon"), drop = FALSE]
# Then: impute(traits, tree, covariates = cov_num)
} # }
