Skip to contents

Connects reconciled species names to an external phylogenetic resource and returns a pruned candidate tree plus a report of which species were matched and which were dropped. Intended as the bridge between the package's reconciliation cascade and any downstream comparative analysis: feed the result of reconcile_data() / reconcile_tree() (or any character vector of cleaned names) into pr_get_tree() and get back a phylo ready for reconcile_apply().

Usage

pr_get_tree(
  x,
  source = c("rotl", "rtrees", "clootl", "fishtree", "datelife", "auto"),
  species_col = NULL,
  taxon = NULL,
  n_tree = 1L,
  cache = FALSE,
  tnrs = c("auto", "always", "never"),
  min_match = 0.8,
  check_ultrametric = TRUE,
  resolve_polytomies = FALSE,
  branch_lengths = NULL,
  ...
)

Arguments

x

One of:

a reconciliation object

returned by reconcile_tree() or reconcile_data(); species are taken from the reconciled name_y column with NAs and unresolved entries dropped.

a character vector

used directly after deduplication and NA removal.

a data frame

species_col must name a character column; its unique non-NA values are used.

source

A length-1 character vector. Which external backend to use. One of:

"rotl"

Open Tree of Life synthesis tree, via the CRAN package rotl. Universal taxonomic coverage; calls tnrs_match_names() to resolve names to OTT ids and then tol_induced_subtree().

"rtrees"

Taxon-specific mega-trees (bird, mammal, fish, amphibian, reptile, plant, shark/ray, bee, butterfly) via the GitHub package rtrees (https://daijiang.github.io/rtrees/). Requires taxon = "<group>". Calls get_tree(). Install with pak::pak("daijiang/rtrees") (GitHub-only). Grafting behaviour: when an input species is not in the chosen mega-tree, rtrees::get_tree() grafts it at the genus level (tip suffix *) or family level (**); if no co-family species is in the mega-tree, the species is dropped. The placement of every input species is reported per-row in result$backend_meta$placement (a tibble with columns input_name, tree_name, placement_status where placement_status is one of "exact", "genus_added", "family_added", "skipped", or "unmatched"). The grafting itself cannot be disabled at the wrapper level (rtrees 1.0.4 has no switch); to exclude grafted tips from a downstream analysis, filter the placement table on placement_status == "exact" and prune the tree to those tip labels. See ?rtrees::get_tree for upstream control (scenario where a graft is placed, but not whether).

"clootl"

Bird-only phylogenies in current Clements taxonomy, via the GitHub package clootl (https://github.com/eliotmiller/clootl). Calls extractTree(). Install with pak::pak("eliotmiller/clootl").

"fishtree"

Fish-only time-calibrated phylogeny (Rabosky et al. 2018), via the CRAN package fishtree. Calls fishtree_phylogeny() (single tree) or fishtree_complete_phylogeny() (multi-tree posterior; triggered by n_tree > 1). Requires exact name matches against the Fish Tree of Life taxonomy — pre-clean with reconcile_data() (with a taxadb authority) for best results.

"datelife"

Universal database of pre-computed chronograms (Sanchez Reyes et al. 2024, Syst. Biol. 73:470), via the GitHub package datelife (https://github.com/phylotastic/datelife). Returns a single SDM-summary chronogram by default; with n_tree > 1, returns a multiPhylo of up to that many per-source candidate chronograms. Install before use with pak::pak("phylotastic/datelife") — the package is GitHub-only (archived from CRAN in 2024 with a heavy transitive dep tree pak can't auto-resolve), so prepR4pcm does NOT pull it in via Suggests.

"auto"

Fall-through dispatcher: try installed backends in priority order (rtrees if taxon provided, then rotl, fishtree, clootl, datelife), return the first result that resolves at least min_match of the species. Useful for first-pass exploration when you don't yet know which backend covers your taxa.

species_col

A length-1 character vector. Required when x is a data frame; ignored otherwise.

taxon

A length-1 character vector. Required when source = "rtrees". One of "bird", "mammal", "fish", "amphibian", "reptile", "plant", "shark_ray", "bee", "butterfly" (see the rtrees package help for get_tree). Ignored for other backends.

n_tree

A length-1 positive integer. How many trees to request from the backend. Default 1L (single phylo for back-compat). Each backend negotiates this differently:

"rotl"

Always returns 1 (the synthesis tree). A one-shot warning is emitted if n_tree > 1.

"rtrees"

n_tree is informational only here. rtrees::get_tree() does not have an n_tree argument; the multi-tree count is fixed by which mega-tree was selected. Reference trees rtrees uses internally: birds = Jetz et al. 2012 (https://birdtree.org, 100 posterior trees); mammals = Upham et al. 2019 (VertLife, 100 by default; set mammal_tree = "phylacine" for the PHYLACINE set); amphibians + squamates = VertLife; fish = Rabosky et al. 2018 (also wrapped by source = "fishtree"); plants = V.PhyloMaker; bees = Bee Tree of Life. Override which mega-tree is used via ... (e.g. bee_tree = "bootstrap" for 100 bee trees instead of the single ML tree). Requires taxon.

"clootl"

n_tree = 1 calls clootl::extractTree() and works out of the box with the v1.6 / 2025 taxonomy bundled in the clootl package. n_tree > 1 calls clootl::sampleTrees(count = n_tree) (capped at 100 upstream) and requires the AvesData repo to be set up once via clootl::get_avesdata_repo(".") first; otherwise it errors with AvesData repo not found.

"fishtree"

Single phylo via fishtree_phylogeny() when n_tree = 1; switches to fishtree_complete_phylogeny() returning a multiPhylo of stochastically polytomy-resolved trees when n_tree > 1.

"datelife"

summary_format = "phylo_sdm" (single summary chronogram) when n_tree = 1; switches to summary_format = "phylo_all" (one chronogram per source, capped at n_tree) when n_tree > 1.

When the request returns a multiPhylo, the result's tree slot is multiPhylo; otherwise phylo.

cache

Logical. Cache the result on disk and reuse it on subsequent identical calls? Default FALSE. When TRUE, the request is keyed by (species, source, n_tree, taxon, tnrs, ...) and stored at pr_tree_cache_dir(). See pr_tree_cache_status() and pr_tree_cache_clear() for inspecting / wiping the cache.

tnrs

A length-1 character vector. Run a TNRS preflight (Open Tree of Life name resolution via rotl::tnrs_match_names) on the species list before calling the backend? One of:

"auto" (default)

Run TNRS only for fishtree, where OTL-resolved names tend to improve the match rate. Not run for clootl by default: clootl uses the eBird / Clements taxonomy, so OTL-resolved names are often different from clootl's preferred names; the network call is also the dominant cost for large requests (~15 min for 10k species before this change). Pass tnrs = "always" if you want it for clootl anyway.

"always"

Run TNRS regardless of backend.

"never"

Skip TNRS even when the backend would benefit.

When rotl is not installed, TNRS is silently skipped with a one-shot warning.

min_match

A length-1 numeric in [0, 1]. Only used when source = "auto". The minimum fraction of input species a backend must resolve for the dispatcher to accept its result; if no backend meets the threshold, the best available is returned with a warning. Default 0.8.

check_ultrametric

Logical. After producing the tree, check that it's ultrametric (all tips equidistant from the root) and warn if not. Default TRUE. Only enforced for backends that normally return chronograms (rtrees, clootl, fishtree, datelife); rotl returns a topology without real branch lengths, so the check is skipped. To force ultrametricity on a non-ultrametric result, use phytools::force.ultrametric() or ape::chronos() directly — prepR4pcm does not modify the tree silently.

resolve_polytomies

Logical. After retrieval, resolve any polytomies via ape::multi2di() with random = TRUE? Default FALSE (back-compat; topology preserved). Useful for phylogenetic meta-analysis, where a strictly bifurcating tree is required for pr_phylo_cor() / ape::vcv() to produce a full-rank correlation matrix.

branch_lengths

A length-1 character vector or NULL. After retrieval (and after polytomy resolution if requested), assign branch lengths via the named method? Default NULL (no transformation; backend's branch lengths are kept as-is). Other values:

"grafen"

Grafen's (1989) method via ape::compute.brlen() with method = "Grafen". The canonical choice for phylogenetic meta-analysis when the topology comes from rotl (whose edge lengths are unit-length placeholders). See Cinar et al. (2022) Methods Ecol. Evol. 13:383, who use this exact pattern.

"compute.brlen"

Same as "grafen" — Grafen is ape::compute.brlen()'s default method. Provided as an alias for users who think in terms of the underlying function name.

"unit"

Set every edge length to 1. The crudest option; useful only for sensitivity-analysis comparisons.

...

Backend-specific arguments forwarded to the underlying call. See the help page of the underlying function in the relevant backend package (tol_induced_subtree in rotl, extractTree in clootl, get_tree in rtrees, fishtree_phylogeny / fishtree_complete_phylogeny in fishtree, datelife_search in datelife) for the full list.

Value

A list with class pr_tree_result and components:

tree

A phylo (single) or multiPhylo (posterior) object from the chosen backend, pruned to the matched species.

matched

Character vector of names from the user's original input (preserving the input format, including any underscores) that resolved to a tip in tree. The dispatcher enforces that matched names are a subset of unique(input) — TNRS substitution, normalisation, and backend-internal name juggling cannot leak intermediate names into this slot.

unmatched

Character vector of names from the original input that did not resolve. Disjoint from matched; length(matched) + length(unmatched) == length(unique(input)) always holds. Inspect these and consider running them back through reconcile_suggest() / a manual override.

mapping

A tibble with one row per unique input species. Core columns: input_name, normalized_name, query_name, tree_name, in_tree, match_type, and placement_status. This is the audit trail for name handling: input_name is what the user supplied, normalized_name is the result of pr_normalize_names(), query_name is the backend query after optional TNRS, tree_name is the actual returned tip label, and match_type is one of "exact", "normalized", "tnrs", or "unmatched". For source = "rtrees", placement_status carries the grafting status from backend_meta$placement; otherwise it is NA. Four further columns record what rotl's TNRS resolver reported for each name: tnrs_number_matches, tnrs_is_synonym, tnrs_approximate_match, and tnrs_flags. These are NA for backends or tnrs settings where TNRS did not run. tnrs_number_matches > 1 flags a homonym, meaning the resolved name is only one of several candidate taxa.

source

The backend that produced the tree.

backend_meta

A named list of diagnostic information. Standard fields populated by the dispatcher:

n_queried

Unique input species count.

n_requested

The n_tree argument the user passed.

n_returned

Number of trees in tree (1 for phylo).

n_matched

Equal to length(matched).

tnrs_replacements

When TNRS ran (tnrs = "always", or tnrs = "auto" for fishtree) and rotl is installed: a named character vector mapping original input to the TNRS-resolved name, for names that TNRS changed. NULL when no TNRS or no replacements occurred. A one-shot cli warning lists the first three substitutions on the call, so silent name correction is impossible.

tip_set_consistent

Logical. For multiPhylo returns: TRUE if every tree shares the same tip set.

dropped_per_tree

For multiPhylo returns where tip_set_consistent = FALSE: a list of character vectors, per tree, listing species missing from each tree relative to the union of all trees. NULL otherwise.

tree_provenance

A list with one entry per returned tree (so tree[[i]] pairs with backend_meta$tree_provenance[[i]] when tree is a multiPhylo).

Backend-specific fields (e.g. taxon, n_grafted, grafted_tips, placement for rtrees; backend, type, tnrs_table for fishtree / rotl; summary_format, source_citations, reference for datelife) are merged in at the top level by the wrapper that called the backend. The rtrees-specific placement slot is a tibble with one row per unique input species and columns input_name, tree_name, placement_status ("exact", "genus_added", "family_added", "skipped", or "unmatched").

Details

Each backend is provided by an external R package that we list in Suggests rather than Imports, so installing prepR4pcm does not pull them in automatically. The error message tells you what to install if you ask for a backend you don't have.

Name handling. Input names are run through pr_normalize_names() before the backend is queried — underscores become spaces, leading/trailing whitespace is trimmed, OTT-id suffixes (e.g. ott770315) and authority strings (e.g. (Linnaeus, 1758)) are stripped, and hybrid signs are standardised. The matched and unmatched slots in the result use the original input format (as you typed it), not the normalised form.

When TNRS substitutes a name (only when tnrs = "always", or for the fishtree backend under tnrs = "auto"), the replacement is recorded in result$backend_meta$tnrs_replacements as a named character vector (original = resolved). A one-shot cli warning lists the first few substitutions on the call itself.

TNRS also returns structured match metadata. pr_get_tree() records it per name in the mapping tibble: tnrs_number_matches, tnrs_is_synonym, tnrs_approximate_match, and tnrs_flags. When a name resolves to more than one taxon (tnrs_number_matches > 1, a homonym), a one-shot cli warning names the affected species, since the resolved name is then only one of several candidates.

References

Backend reference trees:

Jetz, W., Thomas, G. H., Joy, J. B., Hartmann, K., & Mooers, A. O. (2012). The global diversity of birds in space and time. Nature 491: 444–448. doi:10.1038/nature11631 (Used by rtrees for taxon = "bird" and by BirdTree.)

Rabosky, D. L., Chang, J., Title, P. O., Cowman, P. F., Sallan, L., Friedman, M., Kaschner, K., Garilao, C., Near, T. J., Coll, M., & Alfaro, M. E. (2018). An inverse latitudinal gradient in speciation rate for marine fishes. Nature 559: 392–395. doi:10.1038/s41586-018-0273-1 (Fish Tree of Life; used by source = "fishtree" and by rtrees for taxon = "fish".)

Upham, N. S., Esselstyn, J. A., & Jetz, W. (2019). Inferring the mammal tree: Species-level sets of phylogenies for questions in ecology, evolution, and conservation. PLOS Biology 17(12): e3000494. doi:10.1371/journal.pbio.3000494 (VertLife mammal posterior; used by rtrees for taxon = "mammal" with mammal_tree = "vertlife".)

Jin, Y. & Qian, H. (2019). V.PhyloMaker: an R package that can generate very large phylogenies for vascular plants. Ecography 42(8): 1353–1359. doi:10.1111/ecog.04434 (Vascular-plant mega-tree used by rtrees for taxon = "plant"; also the basis for the source = "vphylomaker" augmentation backend in reconcile_augment().)

Sanchez Reyes, L. L., O'Meara, B. C., Brown, J. W., & McTavish, E. J. (2024). DateLife: Leveraging databases and analytical tools to reveal the dated Tree of Life. Systematic Biology 73(2): 470–485. doi:10.1093/sysbio/syae015 (Used by source = "datelife" and by pr_date_tree().)

Methodology:

Chang, J., Rabosky, D. L., & Alfaro, M. E. (2019). Estimating diversification rates on incompletely sampled phylogenies: Theoretical concerns and practical solutions. Systematic Biology 69(3): 602–611. doi:10.1093/sysbio/syz081 (Stochastic polytomy resolution behind fishtree_complete_phylogeny() for n_tree > 1.)

Michonneau, F., Brown, J. W., & Winter, D. J. (2016). rotl: an R package to interact with the Open Tree of Life data. Methods in Ecology and Evolution 7(12): 1476–1481. doi:10.1111/2041-210X.12593 (TNRS preflight and source = "rotl".)

See also

reconcile_tree() / reconcile_data() for producing the reconciled species list that feeds this function; reconcile_apply() for combining the returned phylo with the data frame ready for analysis; reconcile_augment() for filling gaps in an existing tree (a tree-aware alternative to retrieving a fresh tree); pr_date_tree() for time-calibrating an existing topology; pr_cite_tree() for formatting citations for a tree result; pr_tree_compare() for comparing two or more retrieved trees; pr_get_tree_status() for checking which backends are installed and reachable; pr_tree_cache_dir() / pr_tree_cache_status() / pr_tree_cache_clear() for managing the on-disk cache. The companion package pigauto consumes a multiPhylo directly via multi_impute_trees() for posterior- tree PCMs — request a posterior sample with n_tree > 1.

Examples

if (interactive()) {
  # Example 1: birds via clootl (Clements taxonomy). Uses the
  # bundled AVONET subset (657 species placed in the Clements tree).
  data(avonet_subset)
  if (requireNamespace("clootl", quietly = TRUE)) {
    res <- pr_get_tree(avonet_subset, species_col = "Species1",
                       source = "clootl")
    ape::Ntip(res$tree)        # species placed in the tree
    head(res$unmatched)        # names clootl could not resolve
  }

  # Example 2: fish via fishtree (Rabosky et al. 2018, time-calibrated)
  if (requireNamespace("fishtree", quietly = TRUE)) {
    res <- pr_get_tree(c("Salmo salar", "Esox lucius", "Gadus morhua"),
                       source = "fishtree")
    res$tree
  }

  # Example 3: anything via rotl (universal, network)
  if (requireNamespace("rotl", quietly = TRUE)) {
    res <- pr_get_tree(c("Homo sapiens", "Pan troglodytes",
                         "Mus musculus"),
                       source = "rotl")
    res$tree
  }

  # Example 4: posterior of fish trees (50 trees, for multi-tree PCMs)
  if (requireNamespace("fishtree", quietly = TRUE)) {
    res <- pr_get_tree(c("Salmo salar", "Esox lucius"),
                       source = "fishtree", n_tree = 50)
    class(res$tree)            # "multiPhylo"
  }
}