Skip to contents

Species names in your dataset rarely match the tip labels of your phylogenetic tree. Formatting differences (Homo_sapiens vs Homo sapiens), taxonomic synonymy (Corvus brachyrhynchos splits and lumps), and simple spelling mistakes silently drop species from PGLS, phylogenetic mixed models, and other phylogenetic comparative methods (PCMs). prepR4pcm is a toolkit for ecologists and evolutionary biologists to detect and resolve these mismatches, audit every decision, and produce aligned data-tree pairs ready for downstream analysis.

Typical workflow

A minimal end-to-end pipeline looks like this:

# 1. Match your data frame to a tree
rec <- reconcile_tree(
  avonet_subset, tree_jetz,
  x_species = "Species1",
  fuzzy     = TRUE          # enable typo correction
)

# 2. Review what matched, what is flagged, what is unresolved
reconcile_summary(rec)
reconcile_plot(rec)
reconcile_suggest(rec)      # suggest near-misses for unresolved names

# 3. Correct any unresolved or flagged cases by hand
rec <- reconcile_override(rec,
        name_x = "Corvus brachyrhnchos",  # typo in data
        name_y = "Corvus_brachyrhynchos")

# 4. Produce an aligned dataset and pruned tree
aligned <- reconcile_apply(rec,
                           data = avonet_subset, tree = tree_jetz,
                           species_col = "Species1",
                           drop_unresolved = TRUE)

# 5. aligned$data and aligned$tree are ready for downstream PCM tools

Key concepts

Reconciliation object

The central data structure. Contains a mapping tibble (one row per source name, with match type and score), a meta list (reproducibility provenance), a counts summary, an overrides log of applied manual corrections, and an unused_overrides audit trail of overrides that could not be applied (e.g. when name_y is missing from the target). Returned by all reconcile_* matching functions. Inspect with reconcile_summary(), extract the table with reconcile_mapping(), and act on it with reconcile_apply(), reconcile_merge(), or reconcile_export().

Four-stage matching cascade

Names are resolved in this order, and the first stage that produces a match is recorded as match_type:

  1. exact — verbatim string equality.

  2. normalized — after removing underscores, fixing case, stripping authority strings (Corvus corax Linnaeus 1758), and applying diacritic folding.

  3. synonym — via a local taxonomic database (see taxadb) such as Catalogue of Life or GBIF.

  4. fuzzy — character-level similarity on the remaining unmatched names (opt-in via fuzzy = TRUE).

Any additional overrides or manual edits are applied on top as match_type = "manual".

Provenance

Every decision is logged in the mapping table (match_type, match_score, match_source) and in meta (package version, timestamp, taxonomic authority, fuzzy threshold, etc.). Use reconcile_report() to produce a shareable HTML audit trail for supplementary materials or collaborators.

Splits and lumps

Taxonomic revisions often split one species into several, or lump several into one. reconcile_splits_lumps() flags these cases so you can decide how to handle them before analysis.

Tree augmentation

When unresolved species have congeners in the tree, reconcile_augment() can graft them in as sister taxa at genus level. This is an exploratory aid: always run sensitivity analyses with and without augmented tips.

Getting started

References

Mizuno, A., Drobniak, S.M., Williams, C., Lagisz, M. & Nakagawa, S. (2025) Promoting the use of phylogenetic multinomial generalised mixed-effects model to understand the evolution of discrete traits. Journal of Evolutionary Biology 38:1699–1715. doi:10.1093/jeb/voaf116

Norman, K.E., Chamberlain, S. & Boettiger, C. (2020) taxadb: A high-performance local taxonomic database interface. Methods in Ecology and Evolution 11:1153–1159. doi:10.1111/2041-210X.13440

Paradis, E. & Schliep, K. (2019) ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35:526–528. doi:10.1093/bioinformatics/bty633

Author

Maintainer: Shinichi Nakagawa itchyshin@gmail.com (ORCID) [copyright holder]

Authors: