Skip to contents

Match several trait or occurrence datasets against a single phylogenetic tree in one call. Species that appear in more than one dataset are reconciled once; the combined mapping records which dataset(s) each species belongs to, making it easy to identify the set of species with complete trait coverage.

Usage

reconcile_multi(
  datasets,
  tree,
  species_cols = NULL,
  authority = "col",
  rank = c("species", "subspecies"),
  overrides = NULL,
  db_version = NULL,
  fuzzy = FALSE,
  fuzzy_threshold = 0.9,
  resolve = c("flag", "first"),
  quiet = FALSE
)

Arguments

datasets

A named list of data frames. The names are used as dataset labels (e.g. morpho, nests, plumage) in the output.

tree

An ape::phylo object, or a path to a Newick/Nexus file.

species_cols

Character vector. Species column name in each dataset. If length 1, the same column name is used for every dataset. Auto-detected from each data frame if NULL.

authority

A length-1 character vector, or NULL. Taxonomic authority used for synonym resolution (stage 3 of the cascade). One of:

"col" (default)

Catalogue of Life — broad, curated, frequently updated. A sensible default for most taxa.

"itis"

Integrated Taxonomic Information System — strong for North American vertebrates and plants.

"gbif"

Global Biodiversity Information Facility backbone. Wider coverage; includes more recent synonymy.

"ncbi"

NCBI Taxonomy — best when working with sequence data.

"ott"

Open Tree of Life synthetic taxonomy. Useful when your downstream phylogeny is from the Open Tree synthesis.

"itis_test"

A small bundled subset of ITIS, cached locally with taxadb for testing. Intended for examples and unit tests; not for analysis.

"gnverifier"

HTTP-backed verification against ~100 sources via the Global Names verifier; no local database download. See vignette("getting-started") for the trade-off (wider coverage, requires network and the httr2 package).

NULL

Skip the synonym stage entirely. Useful for quick checks or when taxadb is unavailable. Stages 1, 2 and 4 still run.

Five authority codes that earlier versions of the package advertised — "iucn", "tpl", "fb", "slb", "wd" — are no longer accepted. Empirical testing against taxadb v22.12 showed that iucn errors with a schema mismatch and the others are not taxadb providers at all. Passing one of those values now produces a helpful migration error.

rank

A length-1 character vector. Controls how trinomials are handled during normalisation:

"species" (default)

Strip infraspecific epithets so that "Parus major major" becomes "Parus major" before matching.

"subspecies"

Keep trinomials intact. Use this when your analysis operates at subspecies level.

overrides

Optional pre-built corrections. Either a data frame with at least columns name_x and name_y (plus an optional user_note column), or a file path to a CSV with the same columns. Any name listed here bypasses the cascade and is recorded as match_type = "manual". Useful for applying published crosswalks (see reconcile_crosswalk()) or for locking down decisions made in a previous run.

db_version

A length-1 character vector. taxadb database snapshot to use (e.g. "22.12"). NULL (default) uses the latest available.

fuzzy

Logical. Enables the fuzzy-matching stage when TRUE. Default FALSE. Turn this on to catch likely typos (Corvus brachyrhnchos -> Corvus brachyrhynchos). When FALSE, stages 1–3 still run.

fuzzy_threshold

Numeric in [0, 1]. Minimum genus-weighted similarity score for a fuzzy match to be accepted. Default 0.9 (roughly "no more than ~10% of characters differ"). Lower values (e.g. 0.7) are more permissive but produce more false positives; always review fuzzy matches with reconcile_suggest() or reconcile_review() before trusting them.

resolve

A length-1 character vector. What to do with borderline matches:

"flag" (default)

Mark low-confidence fuzzy matches (score below flag_threshold) and names with indirect taxadb synonymy as match_type = "flagged" so you can audit them with reconcile_review() or reconcile_suggest().

"first"

Accept the highest-scoring candidate silently, without flagging. Faster but riskier; use only when you have already reviewed the ambiguities.

quiet

Logical. Suppresses progress messages when TRUE. Default FALSE.

Value

A reconciliation object. The mapping tibble gains one logical column per input dataset (e.g. in_morpho, in_nests) indicating which datasets contained each species.

Examples

data(avonet_subset)
data(nesttrait_subset)
data(tree_jetz)
datasets <- list(
  morpho = avonet_subset,
  nests  = nesttrait_subset
)
result <- reconcile_multi(datasets, tree_jetz,
                          species_cols = c("Species1", "Scientific_name"),
                          authority = NULL)
#>  Reconciling 919 unique names from 2 datasets vs 657 tree tips
#>  Matching 919 x 657 names through 2 stages...
#>  Stage 1/2: Exact matching...
#>  Stage 2/2: Normalised matching (0 matched so far)...
#>  Matched 657/919 unique names to tree tips
print(result)
#> 
#> ── Reconciliation: multiple datasets vs tree ───────────────────────────────────
#>   Source x: morpho, nests
#>   Source y: phylo (657 tips)
#>   Authority: none
#>   Timestamp: 2026-06-16 10:09:53
#>  Match coverage: [█████████████████████░░░░░░░░░] 71% (657/919)
#> 
#> ── Match summary ──
#> 
#>  Exact: 0 ( 0.0%)
#>  Normalized: 657 (71.5%)
#>  Synonym: 0 ( 0.0%)
#>  Fuzzy: 0 ( 0.0%)
#>  Manual: 0 ( 0.0%)
#> ! Unresolved (x only):262 (28.5%)
#> ! Unresolved (y only):0
#> ! Flagged for review: 0
#>  Use `reconcile_summary()` for details, `reconcile_mapping()` for the full table.