Reconcile one dataset against multiple phylogenetic trees

Takes a single data frame and matches it against each tree in a named list, returning one reconciliation object per tree. This is the standard workflow for generating separate tree-compatible datasets aligned to different phylogenies (e.g., Clements 2023, 2024, 2025, Jetz 2012).

Usage

reconcile_to_trees(
  x,
  trees,
  x_species = NULL,
  authority = "col",
  rank = c("species", "subspecies"),
  overrides = NULL,
  db_version = NULL,
  fuzzy = FALSE,
  fuzzy_threshold = 0.9,
  resolve = c("flag", "first"),
  quiet = FALSE,
  x_label = NULL
)

Arguments

x

A data frame.

trees

A named list of ape::phylo objects or file paths.

x_species

A length-1 character vector. Column name in x containing species names. Auto-detected if NULL.

authority

A length-1 character vector, or NULL. Taxonomic authority used for synonym resolution (stage 3 of the cascade). One of:

"col" (default): Catalogue of Life — broad, curated, frequently updated. A sensible default for most taxa.
"itis": Integrated Taxonomic Information System — strong for North American vertebrates and plants.
"gbif": Global Biodiversity Information Facility backbone. Wider coverage; includes more recent synonymy.
"ncbi": NCBI Taxonomy — best when working with sequence data.
"ott": Open Tree of Life synthetic taxonomy. Useful when your downstream phylogeny is from the Open Tree synthesis.
"itis_test": A small bundled subset of ITIS, cached locally with taxadb for testing. Intended for examples and unit tests; not for analysis.
"gnverifier": HTTP-backed verification against ~100 sources via the Global Names verifier; no local database download. See vignette("getting-started") for the trade-off (wider coverage, requires network and the httr2 package).
NULL: Skip the synonym stage entirely. Useful for quick checks or when taxadb is unavailable. Stages 1, 2 and 4 still run.

Five authority codes that earlier versions of the package advertised — "iucn", "tpl", "fb", "slb", "wd" — are no longer accepted. Empirical testing against taxadb v22.12 showed that iucn errors with a schema mismatch and the others are not taxadb providers at all. Passing one of those values now produces a helpful migration error.

rank

A length-1 character vector. Controls how trinomials are handled during normalisation:

"species" (default): Strip infraspecific epithets so that "Parus major major" becomes "Parus major" before matching.
"subspecies": Keep trinomials intact. Use this when your analysis operates at subspecies level.

overrides

Optional pre-built corrections. Either a data frame with at least columns name_x and name_y (plus an optional user_note column), or a file path to a CSV with the same columns. Any name listed here bypasses the cascade and is recorded as match_type = "manual". Useful for applying published crosswalks (see reconcile_crosswalk()) or for locking down decisions made in a previous run.

db_version

A length-1 character vector. taxadb database snapshot to use (e.g. "22.12"). NULL (default) uses the latest available.

fuzzy

Logical. Enables the fuzzy-matching stage when TRUE. Default FALSE. Turn this on to catch likely typos (Corvus brachyrhnchos -> Corvus brachyrhynchos). When FALSE, stages 1–3 still run.

fuzzy_threshold

Numeric in [0, 1]. Minimum genus-weighted similarity score for a fuzzy match to be accepted. Default 0.9 (roughly "no more than ~10% of characters differ"). Lower values (e.g. 0.7) are more permissive but produce more false positives; always review fuzzy matches with reconcile_suggest() or reconcile_review() before trusting them.

resolve

A length-1 character vector. What to do with borderline matches:

"flag" (default): Mark low-confidence fuzzy matches (score below flag_threshold) and names with indirect taxadb synonymy as match_type = "flagged" so you can audit them with reconcile_review() or reconcile_suggest().
"first": Accept the highest-scoring candidate silently, without flagging. Faster but riskier; use only when you have already reviewed the ambiguities.

quiet

Logical. Suppresses progress messages when TRUE. Default FALSE.

x_label

A length-1 character vector or NULL. Human-readable label for source x stored in the reconciliation metadata and shown in print() / format(). Defaults to the expression passed as x (via deparse(substitute())). Set this explicitly when calling reconcile_data() inside another function so the label reflects the real data source rather than the local argument name.

Value

A named list of reconciliation objects, one per tree, with the same names as trees.

Details

Species names in x are normalised once and reused across all trees, so synonym lookups are not repeated.

Examples

data(avonet_subset)
data(tree_jetz)
data(tree_clements25)
results <- reconcile_to_trees(
  avonet_subset,
  trees = list(jetz = tree_jetz, clements = tree_clements25),
  x_species = "Species1",
  authority = NULL
)
#> ℹ Reconciling 919 data names against 2 trees
#> ℹ   [jetz] 657 tips
#> ℹ Matching 919 x 657 names through 2 stages...
#> ℹ Stage 1/2: Exact matching...
#> ℹ Stage 2/2: Normalised matching (0 matched so far)...
#> ✔   [jetz] Matched 657/919 names
#> ℹ   [clements] 854 tips
#> ℹ Matching 919 x 854 names through 2 stages...
#> ℹ Stage 1/2: Exact matching...
#> ℹ Stage 2/2: Normalised matching (0 matched so far)...
#> ✔   [clements] Matched 854/919 names
# Compare overlap across trees
sapply(results, function(r) r$counts$n_exact)
#>     jetz clements 
#>        0        0