Graft missing species onto a phylogenetic tree (genus-level placement)
Source:R/reconcile_augment.R
reconcile_augment.RdWhen a reconciliation identifies species that are present in your data
but missing from the tree, reconcile_augment() attaches each missing
species as sister to a congener — i.e., a species in the same genus
already present in the tree. The result is a tree that contains every
species in your dataset, at the cost of making a strong assumption
about where the new tips sit.
Arguments
- reconciliation
A reconciliation object, typically from
reconcile_tree().- tree
An
ape::phyloobject. Must be the same tree used to buildreconciliation(or a tree with the same tip set). Forsource = "rtrees", this is passed tortreesas the user-supplied backbone (tree_by_user = TRUE).- where
A length-1 character vector. Where to attach each new tip (only used when
source = "internal"; ignored otherwise):"genus"(default)Attach as sister to a single congener chosen at random from the genus. Recommended when the genus has only one or two representatives in the tree, or when you want variation across runs for sensitivity analyses.
"near"Attach at the most recent common ancestor (MRCA) of all congeners in the tree. Better when the genus is well-represented, because the new tip is not arbitrarily tied to one sister taxon.
- branch_length
A length-1 character vector. How to set the terminal branch length of each newly added tip (only used when
source = "internal"; ignored otherwise —rtreessets its own branch lengths):"congener_median"(default)Median terminal branch length of the species' congeners. Uses the average "how long since this group diverged" for the genus. Recommended for time-calibrated trees because it preserves approximate branch-length scale.
"half_terminal"Half the sister tip's terminal branch. A conservative alternative that places the new tip as a recent split from its sister. Useful when the genus is sparsely sampled and the median is unreliable.
"zero"Zero-length branch, producing a polytomy with the sister taxon (or MRCA). Use for exploratory sensitivity checks where you want to see the effect of adding species without assuming any divergence time.
When the input tree is ultrametric, each grafted tip's terminal edge is adjusted after placement so the augmented tree stays ultrametric — a requirement of phylogenetic comparative methods.
branch_lengththen governs the initial graft only;"zero"is exempt, since it asks for a polytomy by construction.- seed
A length-1 integer or
NULL. When non-NULLandsource = "internal", a fixed seed for the random congener choice whenwhere = "genus", making the call reproducible. WhenNULL(default), the session's current RNG state is used so results vary across runs — useful for sensitivity analyses that explore the variation introduced by the random choice. Set to a fixed integer in real analyses so results are reproducible. The seed is scoped to this call: the session RNG state is saved before and restored after, so subsequent random draws in your script are unaffected. DefaultNULL. (Forsource = "rtrees", set the seed in your script before callingreconcile_augment();rtreesdoes not accept a seed argument.)- quiet
Logical. Suppress progress messages? Default
FALSE.- source
A length-1 character vector. Which grafting backend to use. One of
"internal"(default),"rtrees", or"vphylomaker". See “Choosing a source”.- taxon
A length-1 character vector. Required when
source = "rtrees". One of"bird","mammal","fish","amphibian","reptile","plant","shark_ray","bee","butterfly". Ignored for"internal"and"vphylomaker".- check_ultrametric
Logical. After grafting, check that the result is ultrametric and warn if not. Default
TRUE. The"rtrees","vphylomaker", and"uphylomaker"backends produce ultrametric trees by design; the"internal"backend does too when the input tree was ultrametric andbranch_lengthis"congener_median"or"half_terminal", but not whenbranch_length = "zero"(which produces zero-length tip edges that break ultrametricity by construction).- ...
Additional arguments forwarded to the chosen backend:
rtrees::get_tree()forsource = "rtrees"(e.g.scenario,n_tree);V.PhyloMaker2::phylo.maker()forsource = "vphylomaker"(e.g.scenarios = "S3",nodes.type);U.PhyloMaker::phylo.maker()forsource = "uphylomaker"(e.g.gen.list,scenario). Ignored whensource = "internal".
Value
A list with:
- tree
The augmented
phyloobject (ormultiPhylowhensource = "rtrees"returns a posterior sample).- original
The original (unmodified)
phyloobject, for easy comparison.- augmented
A tibble documenting each added species:
species,genus,placed_near(sister tip / MRCA node /rtreesplacement note),branch_length,method,n_congeners. Forsource = "rtrees",branch_lengthandn_congenersareNAbecause the backend chooses them.- skipped
A tibble of species that could not be placed, with the reason (e.g. "No congener in tree", "rtrees did not place this species").
- meta
Provenance metadata: source, placement strategy, branch length rule, counts; for
source = "rtrees"includes abackend_metasub-list with the taxon and the number of grafted tips.
When to use this
Tip-grafting is an exploratory convenience, not a substitute for a properly inferred phylogeny. Both source modes (see below) make strong placement assumptions that are often wrong in detail. Use it to keep exploratory PCMs running while you decide how to handle orphan species, and always:
Report exactly which species were augmented (see
$augmentedin the return value).Run sensitivity analyses with and without the augmented tips.
Prefer a published imputed phylogeny (e.g. the PhyloMaker or TACT approaches) when grafting many species.
Choosing a source
"internal"(default)Genus-level placement using only your tree (no external dependencies). Each missing species is attached as sister to a congener (or at the congeneric MRCA). Fast and reproducible, but only works when the genus is already represented in the tree, and assumes the new tip diverged in roughly the same way as its congeners.
"rtrees"Delegates the grafting to the
rtreesmega-tree machinery viartrees::get_tree(tree_by_user = TRUE). Uses your tree as the backbone and letsrtreesplace each missing species using genus / family information from a taxon-specific reference tree. Requirestaxonand the GitHub-onlyrtreespackage (https://daijiang.github.io/rtrees/). Helpful when the genus is absent from your tree but present inrtrees' reference — which the internal mode would skip."vphylomaker"Plant-only alternative to
"rtrees"via either of the GitHub packages V.PhyloMaker2 (https://github.com/jinyizju/V.PhyloMaker2, preferred when installed; updated and enlarged version) or V.PhyloMaker (https://github.com/jinyizju/V.PhyloMaker, used as a fallback; original 2019 version). Callsphylo.maker(sp.list, tree, scenarios = ...)with your tree as the backbone. Use this when you want explicit control over the V.PhyloMaker placement scenario ("S1","S2", or"S3"— see Jin & Qian 2019/2022); otherwise"rtrees"withtaxon = "plant"is simpler."uphylomaker"Universal (plants + animals) variant of V.PhyloMaker, via the GitHub package U.PhyloMaker (https://github.com/jinyizju/U.PhyloMaker). Same
phylo.makerconvention but takes agen.list(a genus-family lookup) so it can graft non-plant taxa as well as plants. Use this when your tree spans multiple kingdoms and you want the V.PhyloMaker placement strategy.
Use pr_get_tree() when you have only a species list and need a
candidate tree from scratch (rotl, clootl, or rtrees). Use
reconcile_augment() when you already have a tree and want to fill
the gaps.
References
Paradis, E. & Schliep, K. (2019). ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35: 526–528. doi:10.1093/bioinformatics/bty633
Augmentation backends:
Jin, Y. & Qian, H. (2019). V.PhyloMaker: an R package that can
generate very large phylogenies for vascular plants.
Ecography 42(8): 1353–1359. doi:10.1111/ecog.04434
(source = "vphylomaker", fallback path.)
Jin, Y. & Qian, H. (2022). V.PhyloMaker2: an updated and enlarged R
package that can generate very large phylogenies for vascular plants.
Plant Diversity 44(4): 335–339.
doi:10.1016/j.pld.2022.05.005
(source = "vphylomaker", preferred path.)
Jin, Y. & Qian, H. (2023). U.PhyloMaker: an R package that can
generate large phylogenetic trees for plants and animals.
Plant Diversity 45(3): 347–352.
doi:10.1016/j.pld.2022.12.007
(source = "uphylomaker".)
See also
reconcile_tree() for the reconciliation step;
reconcile_apply() for the non-augmenting alternative (prune data
and tree to the intersection); pr_get_tree() for retrieving a
candidate tree from external resources when you don't have a tree
yet; pr_date_tree() for time-calibrating an existing topology;
pr_cite_tree() for formatting tree provenance citations. The
companion package
pigauto consumes the
resulting tree (or multiPhylo) directly via
multi_impute_trees() for posterior-tree PCMs.
Other reconciliation functions:
reconcile_apply(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_suggest(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
# --- Example 1: genus-level placement with congener_median branch lengths ---
x <- data.frame(species = c("A a", "A missing", "B c", "C absent"))
tree <- ape::read.tree(text = "((A_a:1,A_b:1):1,B_c:2);")
result <- reconcile_tree(x, tree, x_species = "species",
authority = NULL, quiet = TRUE)
aug <- reconcile_augment(result, tree, seed = 42, quiet = TRUE)
#> Warning: Tree returned by "internal" is not strictly ultrametric.
#> ℹ Most PCM methods (PGLS, BM, OU, etc.) assume ultrametric trees.
#> → To force: `phytools::force.ultrametric(result$tree)` or
#> `ape::chronos(result$tree)`.
#> • To suppress this check: pass `check_ultrametric = FALSE`.
# Compare original vs augmented tree
cat("Original tips:", ape::Ntip(tree), "\n")
#> Original tips: 3
cat("Augmented tips:", ape::Ntip(aug$tree), "\n")
#> Augmented tips: 4
cat("Added:", nrow(aug$augmented), "| Skipped:", nrow(aug$skipped), "\n")
#> Added: 1 | Skipped: 1
# Inspect which species were added and where they were placed
head(aug$augmented[, c("species", "genus", "placed_near",
"branch_length", "n_congeners")])
#> # A tibble: 1 × 5
#> species genus placed_near branch_length n_congeners
#> <chr> <chr> <chr> <dbl> <int>
#> 1 A missing A A a 1 2
# Species skipped (no congener in tree)
head(aug$skipped)
#> # A tibble: 1 × 3
#> species genus reason
#> <chr> <chr> <chr>
#> 1 C absent C No congener in tree
# --- Example 2: MRCA placement with zero-length branches ---
aug_near <- reconcile_augment(result, tree,
where = "near",
branch_length = "zero",
seed = 42, quiet = TRUE)
cat("\nMRCA placement (zero branches):\n")
#>
#> MRCA placement (zero branches):
cat(" Added:", nrow(aug_near$augmented), "\n")
#> Added: 1
# Compare: MRCA placement shows genus-level context
head(aug_near$augmented[, c("species", "placed_near", "method")])
#> # A tibble: 1 × 3
#> species placed_near method
#> <chr> <chr> <chr>
#> 1 A missing MRCA of A a, A b near/0
if (FALSE) { # \dontrun{
# --- Example 3: delegate grafting to rtrees ---
# Useful when the genus is missing from your tree but present in
# the rtrees taxon-specific reference tree.
aug_rt <- reconcile_augment(result, tree,
source = "rtrees",
taxon = "bird",
quiet = TRUE)
nrow(aug_rt$augmented) # how many were placed
aug_rt$meta$backend_meta$n_grafted # how many at higher rank
} # }