Match the species column of one data frame (x) to the species column
of another (y), returning a reconciliation object that records how
every name was resolved. Use this when combining trait datasets, range
datasets, or any other species-level tables that may use slightly
different taxonomies or spellings.
Arguments
- x
A data frame whose species will be matched from.
- y
A data frame whose species will be matched to (typically the "reference" taxonomy or the dataset you want to merge with).
- x_species
A length-1 character vector. Name of the column in
xcontaining scientific names. Auto-detected (e.g.species,Species1,scientific_name) whenNULL.- y_species
A length-1 character vector. Name of the column in
ycontaining scientific names. Auto-detected whenNULL.A length-1 character vector, or
NULL. Taxonomic authority used for synonym resolution (stage 3 of the cascade). One of:"col"(default)Catalogue of Life — broad, curated, frequently updated. A sensible default for most taxa.
"itis"Integrated Taxonomic Information System — strong for North American vertebrates and plants.
"gbif"Global Biodiversity Information Facility backbone. Wider coverage; includes more recent synonymy.
"ncbi"NCBI Taxonomy — best when working with sequence data.
"ott"Open Tree of Life synthetic taxonomy. Useful when your downstream phylogeny is from the Open Tree synthesis.
"itis_test"A small bundled subset of ITIS, cached locally with taxadb for testing. Intended for examples and unit tests; not for analysis.
"gnverifier"HTTP-backed verification against ~100 sources via the Global Names verifier; no local database download. See
vignette("getting-started")for the trade-off (wider coverage, requires network and the httr2 package).NULLSkip the synonym stage entirely. Useful for quick checks or when taxadb is unavailable. Stages 1, 2 and 4 still run.
Five authority codes that earlier versions of the package advertised —
"iucn","tpl","fb","slb","wd"— are no longer accepted. Empirical testing against taxadb v22.12 showed thatiucnerrors with a schema mismatch and the others are not taxadb providers at all. Passing one of those values now produces a helpful migration error.- rank
A length-1 character vector. Controls how trinomials are handled during normalisation:
"species"(default)Strip infraspecific epithets so that
"Parus major major"becomes"Parus major"before matching."subspecies"Keep trinomials intact. Use this when your analysis operates at subspecies level.
- overrides
Optional pre-built corrections. Either a data frame with at least columns
name_xandname_y(plus an optionaluser_notecolumn), or a file path to a CSV with the same columns. Any name listed here bypasses the cascade and is recorded asmatch_type = "manual". Useful for applying published crosswalks (seereconcile_crosswalk()) or for locking down decisions made in a previous run.- db_version
A length-1 character vector. taxadb database snapshot to use (e.g.
"22.12").NULL(default) uses the latest available.- fuzzy
Logical. Enables the fuzzy-matching stage when
TRUE. DefaultFALSE. Turn this on to catch likely typos (Corvus brachyrhnchos -> Corvus brachyrhynchos). WhenFALSE, stages 1–3 still run.- fuzzy_threshold
Numeric in [0, 1]. Minimum genus-weighted similarity score for a fuzzy match to be accepted. Default
0.9(roughly "no more than ~10% of characters differ"). Lower values (e.g.0.7) are more permissive but produce more false positives; always review fuzzy matches withreconcile_suggest()orreconcile_review()before trusting them.- flag_threshold
Numeric in [0, 1]. When
resolve = "flag", fuzzy matches with a score below this value are recorded asmatch_type = "flagged"rather than"fuzzy", marking them for manual review. Default0.95. Must be >=fuzzy_thresholdto have any effect.- resolve
A length-1 character vector. What to do with borderline matches:
"flag"(default)Mark low-confidence fuzzy matches (score below
flag_threshold) and names with indirect taxadb synonymy asmatch_type = "flagged"so you can audit them withreconcile_review()orreconcile_suggest()."first"Accept the highest-scoring candidate silently, without flagging. Faster but riskier; use only when you have already reviewed the ambiguities.
- quiet
Logical. Suppresses progress messages when
TRUE. DefaultFALSE.- x_label
A length-1 character vector or
NULL. Human-readable label for sourcexstored in the reconciliation metadata and shown inprint()/format(). Defaults to the expression passed asx(viadeparse(substitute())). Set this explicitly when callingreconcile_data()inside another function so the label reflects the real data source rather than the local argument name.- y_label
A length-1 character vector or
NULL. Same asx_label, for sourcey.
Value
A reconciliation object. The accompanying mapping tibble, match-type counts, provenance metadata, and applied / unused override slots are documented in reconciliation. See the "After the call" section above for the most common next steps.
Details
Names are passed through a four-stage matching cascade, and the first
stage that returns a match is recorded in match_type:
exact — verbatim string equality.
normalized — after stripping underscores, authority strings ("Corvus corax Linnaeus, 1758"), diacritics, and case/whitespace differences.
synonym — lookup in a local taxonomic database via taxadb (Catalogue of Life, GBIF, ITIS, NCBI, ...). Skipped if
authority = NULL.fuzzy — character-level similarity (opt-in via
fuzzy = TRUE). Uses a genus-weighted Levenshtein score (60% genus, 40% specific epithet) with a genus pre-filter so that only plausibly similar genera are compared.
Names that survive all four stages are labelled unresolved. Any
entries supplied through overrides take precedence over the cascade.
After the call. A reconciliation object is the input to
most other functions in the package. Common next steps:
reconcile_summary()— human-readable breakdown of matches.reconcile_plot()— one-glance bar/pie of match composition.reconcile_mapping()— extract the full per-name tibble.reconcile_suggest()— near-miss candidates for unresolved names.reconcile_merge()— join the two datasets using the reconciliation as the species key.reconcile_report()— shareable HTML audit trail.
References
Norman, K.E., Chamberlain, S. & Boettiger, C. (2020) taxadb: A high-performance local taxonomic database interface. Methods in Ecology and Evolution 11:1153–1159. doi:10.1111/2041-210X.13440
See also
reconcile_tree() for matching against a phylogenetic tree;
reconcile_to_trees() / reconcile_trees() / reconcile_multi()
for multi-input workflows.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
# Merge AVONET morphology with nest-site data. Both datasets use
# slightly different taxonomies; authority = NULL keeps the example
# offline (no taxadb download).
data(avonet_subset)
data(nesttrait_subset)
rec <- reconcile_data(avonet_subset, nesttrait_subset,
x_species = "Species1",
y_species = "Scientific_name",
authority = NULL)
#> ℹ Reconciling 919 names (x) vs 916 names (y)
#> ℹ Matching 919 x 916 names through 2 stages...
#> ℹ Stage 1/2: Exact matching...
#> ℹ Stage 2/2: Normalised matching (916 matched so far)...
#> ✔ Matched 916/919 names from x
rec # concise print method
#>
#> ── Reconciliation: data vs data ────────────────────────────────────────────────
#> Source x: avonet_subset
#> Source y: nesttrait_subset
#> Authority: none
#> Timestamp: 2026-06-16 10:09:48
#> ℹ Match coverage: [██████████████████████████████] 100% (916/919)
#>
#> ── Match summary ──
#>
#> • Exact: 916 (99.7%)
#> • Normalized: 0 ( 0.0%)
#> • Synonym: 0 ( 0.0%)
#> • Fuzzy: 0 ( 0.0%)
#> • Manual: 0 ( 0.0%)
#> ! Unresolved (x only):3 ( 0.3%)
#> ! Unresolved (y only):0
#> ! Flagged for review: 0
#> ℹ Use `reconcile_summary()` for details, `reconcile_mapping()` for the full table.
reconcile_summary(rec) # full breakdown
#>
#> === Reconciliation Report ===
#> Type: data_data
#> Timestamp: 2026-06-16 10:09:48
#> Package: prepR4pcm 0.4.0.9000
#> Authority: NONE (version: latest)
#> Rank: species
#>
#> --- Match Summary ---
#> Exact: 916 / 919
#> Normalized: 0 / 919
#> Synonym: 0 / 919
#> Fuzzy: 0 / 919
#> Manual: 0 / 919
#> Unresolved: 3 (x only) + 0 (y only)
#>
#> --- Unresolved: In x But Not In y (3) ---
#> Myzomela irianawidodoae
#> Myzomela prawiradilagae
#> Myzomela wahe
#>
# Join the two datasets on the reconciled species key
merged <- reconcile_merge(rec, avonet_subset, nesttrait_subset,
species_col_x = "Species1",
species_col_y = "Scientific_name")
#> ✔ Merged 916 species (inner join)
head(merged[, c("species_resolved", "Family1", "Common_name")])
#> species_resolved Family1 Common_name
#> 1 Acanthagenys rufogularis Meliphagidae Spiny-cheeked Honeyeater
#> 2 Acanthiza apicalis Acanthizidae Inland Thornbill
#> 3 Acanthiza chrysorrhoa Acanthizidae Yellow-rumped Thornbill
#> 4 Acanthiza cinerea Acanthizidae Grey Thornbill
#> 5 Acanthiza ewingii Acanthizidae Tasmanian Thornbill
#> 6 Acanthiza inornata Acanthizidae Western Thornbill