Connects reconciled species names to an external phylogenetic resource
and returns a pruned candidate tree plus a report of which species
were matched and which were dropped. Intended as the bridge between
the package's reconciliation cascade and any downstream comparative
analysis: feed the result of reconcile_data() / reconcile_tree()
(or any character vector of cleaned names) into pr_get_tree() and
get back a phylo ready for reconcile_apply().
Arguments
- x
One of:
- a
reconciliationobject returned by
reconcile_tree()orreconcile_data(); species are taken from the reconciledname_ycolumn withNAs and unresolved entries dropped.- a character vector
used directly after deduplication and NA removal.
- a data frame
species_colmust name a character column; its unique non-NA values are used.
- a
- source
A length-1 character vector. Which external backend to use. One of:
"rotl"Open Tree of Life synthesis tree, via the CRAN package
rotl. Universal taxonomic coverage; callstnrs_match_names()to resolve names to OTT ids and thentol_induced_subtree()."rtrees"Taxon-specific mega-trees (bird, mammal, fish, amphibian, reptile, plant, shark/ray, bee, butterfly) via the GitHub package
rtrees(https://daijiang.github.io/rtrees/). Requirestaxon = "<group>". Callsget_tree(). Install withpak::pak("daijiang/rtrees")(GitHub-only). Grafting behaviour: when an input species is not in the chosen mega-tree,rtrees::get_tree()grafts it at the genus level (tip suffix*) or family level (**); if no co-family species is in the mega-tree, the species is dropped. The placement of every input species is reported per-row inresult$backend_meta$placement(a tibble with columnsinput_name,tree_name,placement_statuswhereplacement_statusis one of"exact","genus_added","family_added","skipped", or"unmatched"). The grafting itself cannot be disabled at the wrapper level (rtrees 1.0.4 has no switch); to exclude grafted tips from a downstream analysis, filter the placement table onplacement_status == "exact"and prune the tree to those tip labels. See?rtrees::get_treefor upstream control (scenariowhere a graft is placed, but not whether)."clootl"Bird-only phylogenies in current Clements taxonomy, via the GitHub package
clootl(https://github.com/eliotmiller/clootl). CallsextractTree(). Install withpak::pak("eliotmiller/clootl")."fishtree"Fish-only time-calibrated phylogeny (Rabosky et al. 2018), via the CRAN package
fishtree. Callsfishtree_phylogeny()(single tree) orfishtree_complete_phylogeny()(multi-tree posterior; triggered byn_tree > 1). Requires exact name matches against the Fish Tree of Life taxonomy — pre-clean withreconcile_data()(with ataxadbauthority) for best results."datelife"Universal database of pre-computed chronograms (Sanchez Reyes et al. 2024, Syst. Biol. 73:470), via the GitHub package
datelife(https://github.com/phylotastic/datelife). Returns a single SDM-summary chronogram by default; withn_tree > 1, returns a multiPhylo of up to that many per-source candidate chronograms. Install before use withpak::pak("phylotastic/datelife")— the package is GitHub-only (archived from CRAN in 2024 with a heavy transitive dep tree pak can't auto-resolve), so prepR4pcm does NOT pull it in viaSuggests."auto"Fall-through dispatcher: try installed backends in priority order (rtrees if
taxonprovided, then rotl, fishtree, clootl, datelife), return the first result that resolves at leastmin_matchof the species. Useful for first-pass exploration when you don't yet know which backend covers your taxa.
- species_col
A length-1 character vector. Required when
xis a data frame; ignored otherwise.- taxon
A length-1 character vector. Required when
source = "rtrees". One of"bird","mammal","fish","amphibian","reptile","plant","shark_ray","bee","butterfly"(see thertreespackage help forget_tree). Ignored for other backends.- n_tree
A length-1 positive integer. How many trees to request from the backend. Default
1L(single phylo for back-compat). Each backend negotiates this differently:"rotl"Always returns 1 (the synthesis tree). A one-shot warning is emitted if
n_tree > 1."rtrees"n_treeis informational only here.rtrees::get_tree()does not have ann_treeargument; the multi-tree count is fixed by which mega-tree was selected. Reference trees rtrees uses internally: birds = Jetz et al. 2012 (https://birdtree.org, 100 posterior trees); mammals = Upham et al. 2019 (VertLife, 100 by default; setmammal_tree = "phylacine"for the PHYLACINE set); amphibians + squamates = VertLife; fish = Rabosky et al. 2018 (also wrapped bysource = "fishtree"); plants = V.PhyloMaker; bees = Bee Tree of Life. Override which mega-tree is used via...(e.g.bee_tree = "bootstrap"for 100 bee trees instead of the single ML tree). Requirestaxon."clootl"n_tree = 1callsclootl::extractTree()and works out of the box with the v1.6 / 2025 taxonomy bundled in theclootlpackage.n_tree > 1callsclootl::sampleTrees(count = n_tree)(capped at 100 upstream) and requires the AvesData repo to be set up once viaclootl::get_avesdata_repo(".")first; otherwise it errors withAvesData repo not found."fishtree"Single phylo via
fishtree_phylogeny()whenn_tree = 1; switches tofishtree_complete_phylogeny()returning a multiPhylo of stochastically polytomy-resolved trees whenn_tree > 1."datelife"summary_format = "phylo_sdm"(single summary chronogram) whenn_tree = 1; switches tosummary_format = "phylo_all"(one chronogram per source, capped atn_tree) whenn_tree > 1.
When the request returns a multiPhylo, the result's
treeslot ismultiPhylo; otherwisephylo.- cache
Logical. Cache the result on disk and reuse it on subsequent identical calls? Default
FALSE. WhenTRUE, the request is keyed by(species, source, n_tree, taxon, tnrs, ...)and stored atpr_tree_cache_dir(). Seepr_tree_cache_status()andpr_tree_cache_clear()for inspecting / wiping the cache.- tnrs
A length-1 character vector. Run a TNRS preflight (Open Tree of Life name resolution via
rotl::tnrs_match_names) on the species list before calling the backend? One of:"auto"(default)Run TNRS only for
fishtree, where OTL-resolved names tend to improve the match rate. Not run forclootlby default: clootl uses the eBird / Clements taxonomy, so OTL-resolved names are often different from clootl's preferred names; the network call is also the dominant cost for large requests (~15 min for 10k species before this change). Passtnrs = "always"if you want it for clootl anyway."always"Run TNRS regardless of backend.
"never"Skip TNRS even when the backend would benefit.
When
rotlis not installed, TNRS is silently skipped with a one-shot warning.- min_match
A length-1 numeric in
[0, 1]. Only used whensource = "auto". The minimum fraction of input species a backend must resolve for the dispatcher to accept its result; if no backend meets the threshold, the best available is returned with a warning. Default0.8.- check_ultrametric
Logical. After producing the tree, check that it's ultrametric (all tips equidistant from the root) and warn if not. Default
TRUE. Only enforced for backends that normally return chronograms (rtrees,clootl,fishtree,datelife);rotlreturns a topology without real branch lengths, so the check is skipped. To force ultrametricity on a non-ultrametric result, usephytools::force.ultrametric()orape::chronos()directly — prepR4pcm does not modify the tree silently.- resolve_polytomies
Logical. After retrieval, resolve any polytomies via
ape::multi2di()withrandom = TRUE? DefaultFALSE(back-compat; topology preserved). Useful for phylogenetic meta-analysis, where a strictly bifurcating tree is required forpr_phylo_cor()/ape::vcv()to produce a full-rank correlation matrix.- branch_lengths
A length-1 character vector or
NULL. After retrieval (and after polytomy resolution if requested), assign branch lengths via the named method? DefaultNULL(no transformation; backend's branch lengths are kept as-is). Other values:"grafen"Grafen's (1989) method via
ape::compute.brlen()withmethod = "Grafen". The canonical choice for phylogenetic meta-analysis when the topology comes fromrotl(whose edge lengths are unit-length placeholders). See Cinar et al. (2022) Methods Ecol. Evol. 13:383, who use this exact pattern."compute.brlen"Same as
"grafen"— Grafen isape::compute.brlen()'s default method. Provided as an alias for users who think in terms of the underlying function name."unit"Set every edge length to 1. The crudest option; useful only for sensitivity-analysis comparisons.
- ...
Backend-specific arguments forwarded to the underlying call. See the help page of the underlying function in the relevant backend package (
tol_induced_subtreeinrotl,extractTreeinclootl,get_treeinrtrees,fishtree_phylogeny/fishtree_complete_phylogenyinfishtree,datelife_searchindatelife) for the full list.
Value
A list with class pr_tree_result and components:
treeA
phylo(single) ormultiPhylo(posterior) object from the chosen backend, pruned to the matched species.matchedCharacter vector of names from the user's original input (preserving the input format, including any underscores) that resolved to a tip in
tree. The dispatcher enforces that matched names are a subset ofunique(input)— TNRS substitution, normalisation, and backend-internal name juggling cannot leak intermediate names into this slot.unmatchedCharacter vector of names from the original input that did not resolve. Disjoint from
matched;length(matched) + length(unmatched) == length(unique(input))always holds. Inspect these and consider running them back throughreconcile_suggest()/ a manual override.mappingA tibble with one row per unique input species. Core columns:
input_name,normalized_name,query_name,tree_name,in_tree,match_type, andplacement_status. This is the audit trail for name handling:input_nameis what the user supplied,normalized_nameis the result ofpr_normalize_names(),query_nameis the backend query after optional TNRS,tree_nameis the actual returned tip label, andmatch_typeis one of"exact","normalized","tnrs", or"unmatched". Forsource = "rtrees",placement_statuscarries the grafting status frombackend_meta$placement; otherwise it isNA. Four further columns record whatrotl's TNRS resolver reported for each name:tnrs_number_matches,tnrs_is_synonym,tnrs_approximate_match, andtnrs_flags. These areNAfor backends ortnrssettings where TNRS did not run.tnrs_number_matches > 1flags a homonym, meaning the resolved name is only one of several candidate taxa.sourceThe backend that produced the tree.
backend_metaA named list of diagnostic information. Standard fields populated by the dispatcher:
n_queriedUnique input species count.
n_requestedThe
n_treeargument the user passed.n_returnedNumber of trees in
tree(1 forphylo).n_matchedEqual to
length(matched).tnrs_replacementsWhen TNRS ran (
tnrs = "always", ortnrs = "auto"forfishtree) androtlis installed: a named character vector mapping original input to the TNRS-resolved name, for names that TNRS changed.NULLwhen no TNRS or no replacements occurred. A one-shotcliwarning lists the first three substitutions on the call, so silent name correction is impossible.tip_set_consistentLogical. For
multiPhyloreturns:TRUEif every tree shares the same tip set.dropped_per_treeFor
multiPhyloreturns wheretip_set_consistent = FALSE: a list of character vectors, per tree, listing species missing from each tree relative to the union of all trees.NULLotherwise.tree_provenanceA list with one entry per returned tree (so
tree[[i]]pairs withbackend_meta$tree_provenance[[i]]whentreeis amultiPhylo).
Backend-specific fields (e.g.
taxon,n_grafted,grafted_tips,placementforrtrees;backend,type,tnrs_tableforfishtree/rotl;summary_format,source_citations,referencefordatelife) are merged in at the top level by the wrapper that called the backend. Thertrees-specificplacementslot is a tibble with one row per unique input species and columnsinput_name,tree_name,placement_status("exact","genus_added","family_added","skipped", or"unmatched").
Details
Each backend is provided by an external R package that we list in
Suggests rather than Imports, so installing prepR4pcm does
not pull them in automatically. The error message tells you what
to install if you ask for a backend you don't have.
Name handling. Input names are run through
pr_normalize_names() before the backend is queried — underscores
become spaces, leading/trailing whitespace is trimmed, OTT-id
suffixes (e.g. ott770315) and authority strings (e.g.
(Linnaeus, 1758)) are stripped, and hybrid signs are
standardised. The matched and unmatched slots in the result use
the original input format (as you typed it), not the normalised
form.
When TNRS substitutes a name (only when tnrs = "always", or for
the fishtree backend under tnrs = "auto"), the replacement is
recorded in result$backend_meta$tnrs_replacements as a named
character vector (original = resolved). A one-shot cli warning
lists the first few substitutions on the call itself.
TNRS also returns structured match metadata. pr_get_tree() records
it per name in the mapping tibble: tnrs_number_matches,
tnrs_is_synonym, tnrs_approximate_match, and tnrs_flags. When a
name resolves to more than one taxon (tnrs_number_matches > 1, a
homonym), a one-shot cli warning names the affected species, since
the resolved name is then only one of several candidates.
References
Backend reference trees:
Jetz, W., Thomas, G. H., Joy, J. B., Hartmann, K., & Mooers, A. O.
(2012). The global diversity of birds in space and time.
Nature 491: 444–448. doi:10.1038/nature11631
(Used by rtrees for taxon = "bird" and by BirdTree.)
Rabosky, D. L., Chang, J., Title, P. O., Cowman, P. F., Sallan, L.,
Friedman, M., Kaschner, K., Garilao, C., Near, T. J., Coll, M., &
Alfaro, M. E. (2018). An inverse latitudinal gradient in speciation
rate for marine fishes. Nature 559: 392–395.
doi:10.1038/s41586-018-0273-1
(Fish Tree of Life; used by source = "fishtree" and by rtrees
for taxon = "fish".)
Upham, N. S., Esselstyn, J. A., & Jetz, W. (2019). Inferring the
mammal tree: Species-level sets of phylogenies for questions in
ecology, evolution, and conservation. PLOS Biology 17(12):
e3000494. doi:10.1371/journal.pbio.3000494
(VertLife mammal posterior; used by rtrees for taxon = "mammal"
with mammal_tree = "vertlife".)
Jin, Y. & Qian, H. (2019). V.PhyloMaker: an R package that can
generate very large phylogenies for vascular plants.
Ecography 42(8): 1353–1359. doi:10.1111/ecog.04434
(Vascular-plant mega-tree used by rtrees for taxon = "plant";
also the basis for the source = "vphylomaker" augmentation
backend in reconcile_augment().)
Sanchez Reyes, L. L., O'Meara, B. C., Brown, J. W., & McTavish, E.
J. (2024). DateLife: Leveraging databases and analytical tools to
reveal the dated Tree of Life. Systematic Biology 73(2):
470–485. doi:10.1093/sysbio/syae015
(Used by source = "datelife" and by pr_date_tree().)
Methodology:
Chang, J., Rabosky, D. L., & Alfaro, M. E. (2019). Estimating
diversification rates on incompletely sampled phylogenies:
Theoretical concerns and practical solutions. Systematic Biology
69(3): 602–611. doi:10.1093/sysbio/syz081
(Stochastic polytomy resolution behind fishtree_complete_phylogeny()
for n_tree > 1.)
Michonneau, F., Brown, J. W., & Winter, D. J. (2016). rotl: an R
package to interact with the Open Tree of Life data. Methods in
Ecology and Evolution 7(12): 1476–1481.
doi:10.1111/2041-210X.12593
(TNRS preflight and source = "rotl".)
See also
reconcile_tree() / reconcile_data() for producing the
reconciled species list that feeds this function;
reconcile_apply() for combining the returned phylo with the
data frame ready for analysis;
reconcile_augment() for filling gaps in an existing tree
(a tree-aware alternative to retrieving a fresh tree);
pr_date_tree() for time-calibrating an existing topology;
pr_cite_tree() for formatting citations for a tree result;
pr_tree_compare() for comparing two or more retrieved trees;
pr_get_tree_status() for checking which backends are installed
and reachable;
pr_tree_cache_dir() / pr_tree_cache_status() /
pr_tree_cache_clear() for managing the on-disk cache.
The companion package
pigauto consumes a
multiPhylo directly via multi_impute_trees() for posterior-
tree PCMs — request a posterior sample with n_tree > 1.
Examples
if (interactive()) {
# Example 1: birds via clootl (Clements taxonomy). Uses the
# bundled AVONET subset (657 species placed in the Clements tree).
data(avonet_subset)
if (requireNamespace("clootl", quietly = TRUE)) {
res <- pr_get_tree(avonet_subset, species_col = "Species1",
source = "clootl")
ape::Ntip(res$tree) # species placed in the tree
head(res$unmatched) # names clootl could not resolve
}
# Example 2: fish via fishtree (Rabosky et al. 2018, time-calibrated)
if (requireNamespace("fishtree", quietly = TRUE)) {
res <- pr_get_tree(c("Salmo salar", "Esox lucius", "Gadus morhua"),
source = "fishtree")
res$tree
}
# Example 3: anything via rotl (universal, network)
if (requireNamespace("rotl", quietly = TRUE)) {
res <- pr_get_tree(c("Homo sapiens", "Pan troglodytes",
"Mus musculus"),
source = "rotl")
res$tree
}
# Example 4: posterior of fish trees (50 trees, for multi-tree PCMs)
if (requireNamespace("fishtree", quietly = TRUE)) {
res <- pr_get_tree(c("Salmo salar", "Esox lucius"),
source = "fishtree", n_tree = 50)
class(res$tree) # "multiPhylo"
}
}