Reconcile several datasets against one phylogenetic tree

Match several trait or occurrence datasets against a single phylogenetic tree in one call. Species that appear in more than one dataset are reconciled once; the combined mapping records which dataset(s) each species belongs to, making it easy to identify the set of species with complete trait coverage.

Usage

reconcile_multi(
  datasets,
  tree,
  species_cols = NULL,
  authority = "col",
  rank = c("species", "subspecies"),
  overrides = NULL,
  db_version = NULL,
  fuzzy = FALSE,
  fuzzy_threshold = 0.9,
  resolve = c("flag", "first"),
  quiet = FALSE
)

Arguments

datasets

A named list of data frames. The names are used as dataset labels (e.g. morpho, nests, plumage) in the output.

tree

An ape::phylo object, or a path to a Newick/Nexus file.

species_cols

Character vector. Species column name in each dataset. If length 1, the same column name is used for every dataset. Auto-detected from each data frame if NULL.

authority

A length-1 character vector, or NULL. Taxonomic authority used for synonym resolution (stage 3 of the cascade). One of:

"col" (default): Catalogue of Life — broad, curated, frequently updated. A sensible default for most taxa.
"itis": Integrated Taxonomic Information System — strong for North American vertebrates and plants.
"gbif": Global Biodiversity Information Facility backbone. Wider coverage; includes more recent synonymy.
"ncbi": NCBI Taxonomy — best when working with sequence data.
"ott": Open Tree of Life synthetic taxonomy. Useful when your downstream phylogeny is from the Open Tree synthesis.
"itis_test": A small bundled subset of ITIS, cached locally with taxadb for testing. Intended for examples and unit tests; not for analysis.
"gnverifier": HTTP-backed verification against ~100 sources via the Global Names verifier; no local database download. See vignette("getting-started") for the trade-off (wider coverage, requires network and the httr2 package).
NULL: Skip the synonym stage entirely. Useful for quick checks or when taxadb is unavailable. Stages 1, 2 and 4 still run.

Five authority codes that earlier versions of the package advertised — "iucn", "tpl", "fb", "slb", "wd" — are no longer accepted. Empirical testing against taxadb v22.12 showed that iucn errors with a schema mismatch and the others are not taxadb providers at all. Passing one of those values now produces a helpful migration error.

rank

A length-1 character vector. Controls how trinomials are handled during normalisation:

"species" (default): Strip infraspecific epithets so that "Parus major major" becomes "Parus major" before matching.
"subspecies": Keep trinomials intact. Use this when your analysis operates at subspecies level.

overrides

Optional pre-built corrections. Either a data frame with at least columns name_x and name_y (plus an optional user_note column), or a file path to a CSV with the same columns. Any name listed here bypasses the cascade and is recorded as match_type = "manual". Useful for applying published crosswalks (see reconcile_crosswalk()) or for locking down decisions made in a previous run.

db_version

A length-1 character vector. taxadb database snapshot to use (e.g. "22.12"). NULL (default) uses the latest available.

fuzzy

Logical. Enables the fuzzy-matching stage when TRUE. Default FALSE. Turn this on to catch likely typos (Corvus brachyrhnchos -> Corvus brachyrhynchos). When FALSE, stages 1–3 still run.

fuzzy_threshold

Numeric in [0, 1]. Minimum genus-weighted similarity score for a fuzzy match to be accepted. Default 0.9 (roughly "no more than ~10% of characters differ"). Lower values (e.g. 0.7) are more permissive but produce more false positives; always review fuzzy matches with reconcile_suggest() or reconcile_review() before trusting them.

resolve

A length-1 character vector. What to do with borderline matches:

"flag" (default): Mark low-confidence fuzzy matches (score below flag_threshold) and names with indirect taxadb synonymy as match_type = "flagged" so you can audit them with reconcile_review() or reconcile_suggest().
"first": Accept the highest-scoring candidate silently, without flagging. Faster but riskier; use only when you have already reviewed the ambiguities.

quiet

Logical. Suppresses progress messages when TRUE. Default FALSE.

Value

A reconciliation object. The mapping tibble gains one logical column per input dataset (e.g. in_morpho, in_nests) indicating which datasets contained each species.

Examples

data(avonet_subset)
data(nesttrait_subset)
data(tree_jetz)
datasets <- list(
  morpho = avonet_subset,
  nests  = nesttrait_subset
)
result <- reconcile_multi(datasets, tree_jetz,
                          species_cols = c("Species1", "Scientific_name"),
                          authority = NULL)
#> ℹ Reconciling 919 unique names from 2 datasets vs 657 tree tips
#> ℹ Matching 919 x 657 names through 2 stages...
#> ℹ Stage 1/2: Exact matching...
#> ℹ Stage 2/2: Normalised matching (0 matched so far)...
#> ✔ Matched 657/919 unique names to tree tips
print(result)
#> 
#> ── Reconciliation: multiple datasets vs tree ───────────────────────────────────
#>   Source x: morpho, nests
#>   Source y: phylo (657 tips)
#>   Authority: none
#>   Timestamp: 2026-06-16 10:09:53
#> ℹ Match coverage: [█████████████████████░░░░░░░░░] 71% (657/919)
#> 
#> ── Match summary ──
#> 
#> • Exact: 0 ( 0.0%)
#> • Normalized: 657 (71.5%)
#> • Synonym: 0 ( 0.0%)
#> • Fuzzy: 0 ( 0.0%)
#> • Manual: 0 ( 0.0%)
#> ! Unresolved (x only):262 (28.5%)
#> ! Unresolved (y only):0
#> ! Flagged for review: 0
#> ℹ Use `reconcile_summary()` for details, `reconcile_mapping()` for the full table.

Reconcile several datasets against one phylogenetic tree

Usage

Arguments

Value

See also

Examples