Reconcile one dataset against multiple phylogenetic trees
Source:R/reconcile_to_trees.R
reconcile_to_trees.RdTakes a single data frame and matches it against each tree in a named
list, returning one reconciliation object per tree. This is the
standard workflow for generating separate tree-compatible datasets
aligned to different phylogenies (e.g., Clements 2023, 2024, 2025,
Jetz 2012).
Arguments
- x
A data frame.
- trees
A named list of
ape::phyloobjects or file paths.- x_species
A length-1 character vector. Column name in
xcontaining species names. Auto-detected ifNULL.A length-1 character vector, or
NULL. Taxonomic authority used for synonym resolution (stage 3 of the cascade). One of:"col"(default)Catalogue of Life — broad, curated, frequently updated. A sensible default for most taxa.
"itis"Integrated Taxonomic Information System — strong for North American vertebrates and plants.
"gbif"Global Biodiversity Information Facility backbone. Wider coverage; includes more recent synonymy.
"ncbi"NCBI Taxonomy — best when working with sequence data.
"ott"Open Tree of Life synthetic taxonomy. Useful when your downstream phylogeny is from the Open Tree synthesis.
"itis_test"A small bundled subset of ITIS, cached locally with taxadb for testing. Intended for examples and unit tests; not for analysis.
"gnverifier"HTTP-backed verification against ~100 sources via the Global Names verifier; no local database download. See
vignette("getting-started")for the trade-off (wider coverage, requires network and the httr2 package).NULLSkip the synonym stage entirely. Useful for quick checks or when taxadb is unavailable. Stages 1, 2 and 4 still run.
Five authority codes that earlier versions of the package advertised —
"iucn","tpl","fb","slb","wd"— are no longer accepted. Empirical testing against taxadb v22.12 showed thatiucnerrors with a schema mismatch and the others are not taxadb providers at all. Passing one of those values now produces a helpful migration error.- rank
A length-1 character vector. Controls how trinomials are handled during normalisation:
"species"(default)Strip infraspecific epithets so that
"Parus major major"becomes"Parus major"before matching."subspecies"Keep trinomials intact. Use this when your analysis operates at subspecies level.
- overrides
Optional pre-built corrections. Either a data frame with at least columns
name_xandname_y(plus an optionaluser_notecolumn), or a file path to a CSV with the same columns. Any name listed here bypasses the cascade and is recorded asmatch_type = "manual". Useful for applying published crosswalks (seereconcile_crosswalk()) or for locking down decisions made in a previous run.- db_version
A length-1 character vector. taxadb database snapshot to use (e.g.
"22.12").NULL(default) uses the latest available.- fuzzy
Logical. Enables the fuzzy-matching stage when
TRUE. DefaultFALSE. Turn this on to catch likely typos (Corvus brachyrhnchos -> Corvus brachyrhynchos). WhenFALSE, stages 1–3 still run.- fuzzy_threshold
Numeric in [0, 1]. Minimum genus-weighted similarity score for a fuzzy match to be accepted. Default
0.9(roughly "no more than ~10% of characters differ"). Lower values (e.g.0.7) are more permissive but produce more false positives; always review fuzzy matches withreconcile_suggest()orreconcile_review()before trusting them.- resolve
A length-1 character vector. What to do with borderline matches:
"flag"(default)Mark low-confidence fuzzy matches (score below
flag_threshold) and names with indirect taxadb synonymy asmatch_type = "flagged"so you can audit them withreconcile_review()orreconcile_suggest()."first"Accept the highest-scoring candidate silently, without flagging. Faster but riskier; use only when you have already reviewed the ambiguities.
- quiet
Logical. Suppresses progress messages when
TRUE. DefaultFALSE.- x_label
A length-1 character vector or
NULL. Human-readable label for sourcexstored in the reconciliation metadata and shown inprint()/format(). Defaults to the expression passed asx(viadeparse(substitute())). Set this explicitly when callingreconcile_data()inside another function so the label reflects the real data source rather than the local argument name.
Value
A named list of reconciliation objects, one per tree, with
the same names as trees.
Details
Species names in x are normalised once and reused across all trees,
so synonym lookups are not repeated.
See also
reconcile_tree() for the single-tree case;
reconcile_diff() to compare two reconciliations (e.g. to quantify
how many species are gained or lost by switching taxonomies).
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_tree(),
reconcile_trees()
Examples
data(avonet_subset)
data(tree_jetz)
data(tree_clements25)
results <- reconcile_to_trees(
avonet_subset,
trees = list(jetz = tree_jetz, clements = tree_clements25),
x_species = "Species1",
authority = NULL
)
#> ℹ Reconciling 919 data names against 2 trees
#> ℹ [jetz] 657 tips
#> ℹ Matching 919 x 657 names through 2 stages...
#> ℹ Stage 1/2: Exact matching...
#> ℹ Stage 2/2: Normalised matching (0 matched so far)...
#> ✔ [jetz] Matched 657/919 names
#> ℹ [clements] 854 tips
#> ℹ Matching 919 x 854 names through 2 stages...
#> ℹ Stage 1/2: Exact matching...
#> ℹ Stage 2/2: Normalised matching (0 matched so far)...
#> ✔ [clements] Matched 854/919 names
# Compare overlap across trees
sapply(results, function(r) r$counts$n_exact)
#> jetz clements
#> 0 0