Reconcile several datasets against one phylogenetic tree
Source:R/reconcile_multi.R
reconcile_multi.RdMatch several trait or occurrence datasets against a single phylogenetic tree in one call. Species that appear in more than one dataset are reconciled once; the combined mapping records which dataset(s) each species belongs to, making it easy to identify the set of species with complete trait coverage.
Arguments
- datasets
A named list of data frames. The names are used as dataset labels (e.g.
morpho,nests,plumage) in the output.- tree
An
ape::phyloobject, or a path to a Newick/Nexus file.- species_cols
Character vector. Species column name in each dataset. If length 1, the same column name is used for every dataset. Auto-detected from each data frame if
NULL.A length-1 character vector, or
NULL. Taxonomic authority used for synonym resolution (stage 3 of the cascade). One of:"col"(default)Catalogue of Life — broad, curated, frequently updated. A sensible default for most taxa.
"itis"Integrated Taxonomic Information System — strong for North American vertebrates and plants.
"gbif"Global Biodiversity Information Facility backbone. Wider coverage; includes more recent synonymy.
"ncbi"NCBI Taxonomy — best when working with sequence data.
"ott"Open Tree of Life synthetic taxonomy. Useful when your downstream phylogeny is from the Open Tree synthesis.
"itis_test"A small bundled subset of ITIS, cached locally with taxadb for testing. Intended for examples and unit tests; not for analysis.
"gnverifier"HTTP-backed verification against ~100 sources via the Global Names verifier; no local database download. See
vignette("getting-started")for the trade-off (wider coverage, requires network and the httr2 package).NULLSkip the synonym stage entirely. Useful for quick checks or when taxadb is unavailable. Stages 1, 2 and 4 still run.
Five authority codes that earlier versions of the package advertised —
"iucn","tpl","fb","slb","wd"— are no longer accepted. Empirical testing against taxadb v22.12 showed thatiucnerrors with a schema mismatch and the others are not taxadb providers at all. Passing one of those values now produces a helpful migration error.- rank
A length-1 character vector. Controls how trinomials are handled during normalisation:
"species"(default)Strip infraspecific epithets so that
"Parus major major"becomes"Parus major"before matching."subspecies"Keep trinomials intact. Use this when your analysis operates at subspecies level.
- overrides
Optional pre-built corrections. Either a data frame with at least columns
name_xandname_y(plus an optionaluser_notecolumn), or a file path to a CSV with the same columns. Any name listed here bypasses the cascade and is recorded asmatch_type = "manual". Useful for applying published crosswalks (seereconcile_crosswalk()) or for locking down decisions made in a previous run.- db_version
A length-1 character vector. taxadb database snapshot to use (e.g.
"22.12").NULL(default) uses the latest available.- fuzzy
Logical. Enables the fuzzy-matching stage when
TRUE. DefaultFALSE. Turn this on to catch likely typos (Corvus brachyrhnchos -> Corvus brachyrhynchos). WhenFALSE, stages 1–3 still run.- fuzzy_threshold
Numeric in [0, 1]. Minimum genus-weighted similarity score for a fuzzy match to be accepted. Default
0.9(roughly "no more than ~10% of characters differ"). Lower values (e.g.0.7) are more permissive but produce more false positives; always review fuzzy matches withreconcile_suggest()orreconcile_review()before trusting them.- resolve
A length-1 character vector. What to do with borderline matches:
"flag"(default)Mark low-confidence fuzzy matches (score below
flag_threshold) and names with indirect taxadb synonymy asmatch_type = "flagged"so you can audit them withreconcile_review()orreconcile_suggest()."first"Accept the highest-scoring candidate silently, without flagging. Faster but riskier; use only when you have already reviewed the ambiguities.
- quiet
Logical. Suppresses progress messages when
TRUE. DefaultFALSE.
Value
A reconciliation object. The mapping tibble gains one
logical column per input dataset (e.g. in_morpho, in_nests)
indicating which datasets contained each species.
See also
reconcile_tree() for the single-dataset case;
reconcile_merge() to join two datasets after reconciliation.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
data(avonet_subset)
data(nesttrait_subset)
data(tree_jetz)
datasets <- list(
morpho = avonet_subset,
nests = nesttrait_subset
)
result <- reconcile_multi(datasets, tree_jetz,
species_cols = c("Species1", "Scientific_name"),
authority = NULL)
#> ℹ Reconciling 919 unique names from 2 datasets vs 657 tree tips
#> ℹ Matching 919 x 657 names through 2 stages...
#> ℹ Stage 1/2: Exact matching...
#> ℹ Stage 2/2: Normalised matching (0 matched so far)...
#> ✔ Matched 657/919 unique names to tree tips
print(result)
#>
#> ── Reconciliation: multiple datasets vs tree ───────────────────────────────────
#> Source x: morpho, nests
#> Source y: phylo (657 tips)
#> Authority: none
#> Timestamp: 2026-06-16 10:09:53
#> ℹ Match coverage: [█████████████████████░░░░░░░░░] 71% (657/919)
#>
#> ── Match summary ──
#>
#> • Exact: 0 ( 0.0%)
#> • Normalized: 657 (71.5%)
#> • Synonym: 0 ( 0.0%)
#> • Fuzzy: 0 ( 0.0%)
#> • Manual: 0 ( 0.0%)
#> ! Unresolved (x only):262 (28.5%)
#> ! Unresolved (y only):0
#> ! Flagged for review: 0
#> ℹ Use `reconcile_summary()` for details, `reconcile_mapping()` for the full table.