Reconcile species names between a dataset and a phylogenetic tree
Source:R/reconcile_tree.R
reconcile_tree.RdMatch the species in a trait data frame (x) to the tip labels of a
phylogenetic tree (tree), producing a reconciliation object ready
to feed into reconcile_apply(), PGLS, phylogenetic GLMMs, ancestral
state reconstruction, or any other phylogenetic comparative method
(PCM). This is typically the first function you call in a prepR4pcm
workflow.
Arguments
- x
A data frame containing the trait data. Must have one column of scientific names.
- tree
An
ape::phyloobject, or a length-1 character vector giving the path to a Newick (.nwk,.tre,.tree) or Nexus (.nex,.nexus) file. File format is auto-detected.- x_species
A length-1 character vector. Name of the column in
xcontaining scientific names (the same column referenced byxabove; the term “species names” elsewhere in this help page is a synonym for the same scientific names). WhenNULL, the column is auto-detected from a small list of common labels (e.g.species,Species1,scientific_name); the list is not exhaustive — pass the column name explicitly if your data uses a non-standard label.A length-1 character vector, or
NULL. Taxonomic authority used for synonym resolution (stage 3 of the cascade). One of:"col"(default)Catalogue of Life — broad, curated, frequently updated. A sensible default for most taxa.
"itis"Integrated Taxonomic Information System — strong for North American vertebrates and plants.
"gbif"Global Biodiversity Information Facility backbone. Wider coverage; includes more recent synonymy.
"ncbi"NCBI Taxonomy — best when working with sequence data.
"ott"Open Tree of Life synthetic taxonomy. Useful when your downstream phylogeny is from the Open Tree synthesis.
"itis_test"A small bundled subset of ITIS, cached locally with taxadb for testing. Intended for examples and unit tests; not for analysis.
"gnverifier"HTTP-backed verification against ~100 sources via the Global Names verifier; no local database download. See
vignette("getting-started")for the trade-off (wider coverage, requires network and the httr2 package).NULLSkip the synonym stage entirely. Useful for quick checks or when taxadb is unavailable. Stages 1, 2 and 4 still run.
Five authority codes that earlier versions of the package advertised —
"iucn","tpl","fb","slb","wd"— are no longer accepted. Empirical testing against taxadb v22.12 showed thatiucnerrors with a schema mismatch and the others are not taxadb providers at all. Passing one of those values now produces a helpful migration error.- rank
A length-1 character vector. Controls how trinomials are handled during normalisation:
"species"(default)Strip infraspecific epithets so that
"Parus major major"becomes"Parus major"before matching."subspecies"Keep trinomials intact. Use this when your analysis operates at subspecies level.
- overrides
Optional pre-built corrections. Either a data frame with at least columns
name_xandname_y(plus an optionaluser_notecolumn), or a file path to a CSV with the same columns. Any name listed here bypasses the cascade and is recorded asmatch_type = "manual". Useful for applying published crosswalks (seereconcile_crosswalk()) or for locking down decisions made in a previous run.- db_version
A length-1 character vector. taxadb database snapshot to use (e.g.
"22.12").NULL(default) uses the latest available.- fuzzy
Logical. Enables the fuzzy-matching stage when
TRUE. DefaultFALSE. Turn this on to catch likely typos (Corvus brachyrhnchos -> Corvus brachyrhynchos). WhenFALSE, stages 1–3 still run.- fuzzy_threshold
Numeric in [0, 1]. Minimum genus-weighted similarity score for a fuzzy match to be accepted. Default
0.9(roughly "no more than ~10% of characters differ"). Lower values (e.g.0.7) are more permissive but produce more false positives; always review fuzzy matches withreconcile_suggest()orreconcile_review()before trusting them.- flag_threshold
Numeric in [0, 1]. When
resolve = "flag", fuzzy matches with a score below this value are recorded asmatch_type = "flagged"rather than"fuzzy", marking them for manual review. Default0.95. Must be >=fuzzy_thresholdto have any effect.- resolve
A length-1 character vector. What to do with borderline matches:
"flag"(default)Mark low-confidence fuzzy matches (score below
flag_threshold) and names with indirect taxadb synonymy asmatch_type = "flagged"so you can audit them withreconcile_review()orreconcile_suggest()."first"Accept the highest-scoring candidate silently, without flagging. Faster but riskier; use only when you have already reviewed the ambiguities.
- quiet
Logical. Suppresses progress messages when
TRUE. DefaultFALSE.- x_label
A length-1 character vector or
NULL. Human-readable label for sourcexstored in the reconciliation metadata and shown inprint()/format(). Defaults to the expression passed asx(viadeparse(substitute())). Set this explicitly when callingreconcile_data()inside another function so the label reflects the real data source rather than the local argument name.
Value
A reconciliation object with meta$type == "data_tree".
The mapping tibble has one row per unique name: matched species
(in_x & in_y), data-only orphans (in_x & !in_y, candidates for
reconcile_augment()), and tree-only orphans (!in_x & in_y,
candidates for reconcile_apply() to prune).
Details
Internally, reconcile_tree() treats the tree's tip labels as the
y argument of reconcile_data() and runs the same four-stage
matching cascade (exact -> normalized -> synonym -> fuzzy). Tip labels
typically differ from data names only in formatting (underscores,
capitalisation, authority strings), so even with authority = NULL
you usually recover most matches at the normalized stage. Turn on
fuzzy = TRUE to also catch spelling mistakes.
After reconciliation, the typical workflow is:
Inspect with
reconcile_summary()orreconcile_plot().Investigate unresolved names with
reconcile_suggest()and fix them withreconcile_override()orreconcile_override_batch().Produce an aligned data frame and pruned tree via
reconcile_apply().Optionally, graft orphan species onto the tree with
reconcile_augment()(exploratory only; always run sensitivity analyses).
References
Paradis, E. & Schliep, K. (2019) ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35:526–528. doi:10.1093/bioinformatics/bty633
See also
reconcile_apply() to produce an aligned data-tree pair;
reconcile_augment() to add orphan species back to the tree;
reconcile_to_trees() to reconcile against several trees at once;
reconcile_data() for the data-only counterpart.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_trees()
Examples
# Reconcile the bundled AVONET subset against the Jetz et al. (2012)
# bird tree. `authority = NULL` keeps the example offline; in a real
# analysis you would usually set `authority = "col"` (Catalogue of
# Life) to pick up taxonomic synonyms.
data(avonet_subset)
data(tree_jetz)
rec <- reconcile_tree(
avonet_subset, tree_jetz,
x_species = "Species1",
authority = NULL,
fuzzy = TRUE # also catch typos
)
#> ℹ Reconciling 919 data names vs 657 tree tips
#> ℹ Matching 919 x 657 names through 3 stages...
#> ℹ Stage 1/3: Exact matching...
#> ℹ Stage 2/3: Normalised matching (0 matched so far)...
#> ℹ Stage 3/3: Fuzzy matching (657 matched so far)...
#> ✔ Matched 657/919 data names to tree tips
rec # one-line status
#>
#> ── Reconciliation: data vs tree ────────────────────────────────────────────────
#> Source x: avonet_subset
#> Source y: phylo (657 tips)
#> Authority: none
#> Timestamp: 2026-06-16 10:09:58
#> ℹ Match coverage: [█████████████████████░░░░░░░░░] 71% (657/919)
#>
#> ── Match summary ──
#>
#> • Exact: 0 ( 0.0%)
#> • Normalized: 657 (71.5%)
#> • Synonym: 0 ( 0.0%)
#> • Fuzzy: 0 ( 0.0%)
#> • Manual: 0 ( 0.0%)
#> ! Unresolved (x only):262 (28.5%)
#> ! Unresolved (y only):0
#> ! Flagged for review: 0
#> ℹ Use `reconcile_summary()` for details, `reconcile_mapping()` for the full table.
reconcile_summary(rec) # full breakdown by match type
#>
#> === Reconciliation Report ===
#> Type: data_tree
#> Timestamp: 2026-06-16 10:09:58
#> Package: prepR4pcm 0.4.0.9000
#> Authority: NONE (version: latest)
#> Rank: species
#>
#> --- Match Summary ---
#> Exact: 0 / 919
#> Normalized: 657 / 919
#> Synonym: 0 / 919
#> Fuzzy: 0 / 919
#> Manual: 0 / 919
#> Unresolved: 262 (x only) + 0 (y only)
#>
#> --- Normalized Matches (657) ---
#> "Acanthiza apicalis" -> "Acanthiza_apicalis" ['Acanthiza apicalis' normalised to 'Acanthiza apicalis']
#> "Acanthiza chrysorrhoa" -> "Acanthiza_chrysorrhoa" ['Acanthiza chrysorrhoa' normalised to 'Acanthiza chrysorrhoa']
#> "Acanthiza ewingii" -> "Acanthiza_ewingii" ['Acanthiza ewingii' normalised to 'Acanthiza ewingii']
#> "Acanthiza inornata" -> "Acanthiza_inornata" ['Acanthiza inornata' normalised to 'Acanthiza inornata']
#> "Acanthiza iredalei" -> "Acanthiza_iredalei" ['Acanthiza iredalei' normalised to 'Acanthiza iredalei']
#> "Acanthiza katherina" -> "Acanthiza_katherina" ['Acanthiza katherina' normalised to 'Acanthiza katherina']
#> "Acanthiza lineata" -> "Acanthiza_lineata" ['Acanthiza lineata' normalised to 'Acanthiza lineata']
#> "Acanthiza murina" -> "Acanthiza_murina" ['Acanthiza murina' normalised to 'Acanthiza murina']
#> "Acanthiza nana" -> "Acanthiza_nana" ['Acanthiza nana' normalised to 'Acanthiza nana']
#> "Acanthiza pusilla" -> "Acanthiza_pusilla" ['Acanthiza pusilla' normalised to 'Acanthiza pusilla']
#> "Acanthiza reguloides" -> "Acanthiza_reguloides" ['Acanthiza reguloides' normalised to 'Acanthiza reguloides']
#> "Acanthiza robustirostris" -> "Acanthiza_robustirostris" ['Acanthiza robustirostris' normalised to 'Acanthiza robustirostris']
#> "Acanthiza uropygialis" -> "Acanthiza_uropygialis" ['Acanthiza uropygialis' normalised to 'Acanthiza uropygialis']
#> "Acanthornis magna" -> "Acanthornis_magna" ['Acanthornis magna' normalised to 'Acanthornis magna']
#> "Aphelocephala leucopsis" -> "Aphelocephala_leucopsis" ['Aphelocephala leucopsis' normalised to 'Aphelocephala leucopsis']
#> "Aphelocephala nigricincta" -> "Aphelocephala_nigricincta" ['Aphelocephala nigricincta' normalised to 'Aphelocephala nigricincta']
#> "Aphelocephala pectoralis" -> "Aphelocephala_pectoralis" ['Aphelocephala pectoralis' normalised to 'Aphelocephala pectoralis']
#> "Calamanthus campestris" -> "Calamanthus_campestris" ['Calamanthus campestris' normalised to 'Calamanthus campestris']
#> "Calamanthus fuliginosus" -> "Calamanthus_fuliginosus" ['Calamanthus fuliginosus' normalised to 'Calamanthus fuliginosus']
#> "Crateroscelis murina" -> "Crateroscelis_murina" ['Crateroscelis murina' normalised to 'Crateroscelis murina']
#> ... and 637 more
#>
#> --- Unresolved: In x But Not In y (262) ---
#> Acanthiza cinerea
#> Calamanthus cautus
#> Calamanthus montanellus
#> Calamanthus pyrrhopygius
#> Gerygone citrina
#> Pyrrholaemus sagittatus
#> Artamus leucoryn
#> Cracticus argenteus
#> Melloria quoyi
#> Ceblepyris caesius
#> Ceblepyris cinereus
#> Ceblepyris cucullatus
#> Ceblepyris graueri
#> Ceblepyris pectoralis
#> Celebesica abbotti
#> Coracina dobsoni
#> Coracina panayensis
#> Coracina welchmani
#> Cyanograucalus azureus
#> Edolisoma anale
#> Edolisoma ceramense
#> Edolisoma coerulescens
#> Edolisoma dispar
#> Edolisoma dohertyi
#> Edolisoma grayi
#> Edolisoma holopolium
#> Edolisoma incertum
#> Edolisoma insperatum
#> Edolisoma melas
#> Edolisoma meyerii
#> ... and 232 more
#>
# Produce aligned data + pruned tree ready for PGLS / PGLMM
aligned <- reconcile_apply(rec,
data = avonet_subset,
tree = tree_jetz,
species_col = "Species1",
drop_unresolved = TRUE)
#> ! Dropped 262 rows with unresolved species from data
#> ℹ Tree has 657 tips after alignment
nrow(aligned$data)
#> [1] 657
ape::Ntip(aligned$tree)
#> [1] 657