prepR4pcm 1.0.0
- Stable Software Note release. Version 1.0.0 declares the package-ready release for Ecography Software Note submission, with release date 2026-06-16; no API-breaking changes were introduced relative to 0.5.1.
prepR4pcm 0.5.1
CRAN preparation. Release metadata, README installation guidance, and bundled-data citation links were updated for the first CRAN submission.
pkgdown publishing guard. The pkgdown build wrapper now removes loose agent-instruction pages from the generated site and search index before GitHub Pages deployment.
pkgdown workflow fix.
dependencies: '"most"'in.github/workflows/pkgdown.yamltells pak to skipEnhances(i.e.datelife) during lockfile resolution. Theworkflow_dispatchrebuild no longer fails on “Can’t find package called datelife”. The day-to-day path is unaffected: GitHub Pages still serves the docs straight frommain:/docs/.comparing-tree-backends.Rmdgains a “Branch lengths and time-calibration” section with a per-backend comparison table and a practical decision tree for choosing betweenfishtree/clootl/rtrees/datelife/ Grafen pseudo-time when you need real divergence-time branch lengths. Surfaces the alternative path whendatelifeisn’t installable on a given system.
prepR4pcm 0.5.0
This release adds two opt-in Global Names Architecture backends, a phylogenetic-meta-analysis workflow, and a substantial round of tree-handling audit metadata. All existing API contracts are preserved unless explicitly called out under Breaking changes.
New features
Optional Global Names backends
pr_normalize_names(parser = "gnparser")routes parsing through rgnparser, the R wrapper for the gnparser Go binary. Setparser = "gnparser"for hardened parsing of hybrid signs, complex multi-author year strings, and Open Tree homonym / rank flag parentheticals. Returns the same shape andnormalisation_logattribute as the internal cascade, so the two are drop-in interchangeable. Default staysparser = "internal"(zero-dependency).authority = "gnverifier"is accepted by everyreconcile_*function (reconcile_data(),reconcile_tree(),reconcile_multi(),reconcile_to_trees(),reconcile_trees()). It routes the synonym stage through the Global Names verifier over HTTP (~100 sources in one round-trip) instead of a local taxadb database. No ~100 MB local cache; requires network access andhttr2. Default staysauthority = "col"(taxadb).
Tree-handling audit metadata
pr_get_tree()/pr_date_tree()now returnresult$mapping— one row per unique input species, with the user-facing name, normalised name, backend query name, returned tree tip, match type, and rtrees placement status when available. Replaces ad-hoc reconstruction from$matched+$unmatched+ backend-specific metadata (#73).TNRS match metadata in
result$mapping— four new columns (tnrs_number_matches,tnrs_is_synonym,tnrs_approximate_match,tnrs_flags) carry the structured output ofrotl::tnrs_match_names(). Homonyms (tnrs_number_matches > 1) trigger a one-shot warning naming the affected species.result$backend_meta$placement(rtrees only) — per-input table withinput_name,tree_name,placement_status(exact/genus_added/family_added/skipped/unmatched). Filter toplacement_status == "exact"to drop grafted tips from a sensitivity analysis (#74).TNRS substitutions are now auditable —
result$backend_meta$tnrs_replacementslists every name TNRS changed; a one-shot warning shows the first three. Silent name correction is no longer possible (#72).Multi-tree reporting fields on
pr_get_tree()—backend_metagainsn_requested,tip_set_consistent, anddropped_per_tree(#76).
Phylogenetic meta-analysis path
pr_get_tree()gainsresolve_polytomiesandbranch_lengthsarguments. When the topology comes fromrotl,pr_get_tree(species, source = "rotl", resolve_polytomies = TRUE, branch_lengths = "grafen")returns a tree ready formetafor::rma.mv(). Defaults preserve back-compat.New:
pr_phylo_cor(tree)— thin wrapper aroundape::vcv(tree, corr = TRUE)that turns the tree into the phylogenetic correlation matrix accepted bymetafor::rma.mv(),MCMCglmm::MCMCglmm(),glmmTMB::glmmTMB(), andbrms::brm()as a random-effect structure. Acceptsphylo,multiPhylo, orpr_tree_resultinput.New vignette: “Phylogenetic meta-analysis with rotl + prepR4pcm” — end-to-end walk from species names through rotl topology, bifurcating + Grafen branches, correlation matrix, to
metafor::rma.mv(). Uses a 13-species cross-taxon subset of Pottier et al. 2022’s thermal-tolerance dataset.
Bug fixes
reconcile_augment()keeps grafted trees ultrametric. When the input tree is ultrametric andbranch_length != "zero", a post-graft correction ensures the augmented tips reach the present, so downstream PGLS / BM / OU models see a valid tree. Regression test pinned (PR #105, Losia Lagisz).pr_get_tree()matched / unmatched accounting now enforces three invariants:matched⊆unique(input),unmatched⊆unique(input), and|matched| + |unmatched| == |unique(input)|. Thematchedslot preserves the user’s original input format (underscores stay underscores). Previously, TNRS-resolved names could leak through (#73).pr_get_tree(source = "clootl")is now ~250× faster on large bird species lists (3.6 s vs > 15 min for 10,597 birds; #70). TNRS preflight no longer runs for clootl by default (clootl uses Clements taxonomy, not OTL);force = TRUEis passed toclootl::extractTree()so a single unmatched species doesn’t error out the whole call.pr_get_tree(source = "clootl")accepts underscore-separated names by converting them to the space-separated formclootlexpects, while preserving the user’s originals in$matched/$unmatched(#75).pr_get_tree(source = "clootl", n_tree = 1)no longer requireslibrary(clootl)— the wrapper temporarily attaches the namespace for the duration of the call.reconcile_apply()validatesspecies_colbefore filtering data, so a typo errors clearly instead of silently returning zero data rows.reconcile_multi()no longer undercounts dataset-specific matches when the same species appears in different formats across datasets (e.g.Homo_sapiensvsHomo sapiens). The cascade gains amulti_x = TRUEmode; the mapping gains the documentedin_<dataset>logical columns (#10, Ayumi Mizuno).reconcile_summary()no longer auto-prints when assigned. The formatted report lives on the returned object’sformatted_textslot and renders viaprint.reconciliation_summary()(#12, Ayumi Mizuno).Trailing parenthetical leak fixed in name normalisation. The internal cascade now strips any trailing parenthetical, catching Open Tree’s
(species in domain Eukaryota)and similar TNRS leak-through.pr_lookup_authority()andpr_ensure_db()no longer print raw{.pkg ...}/{.code ...}cli template strings; errors route throughcli::cli_abort()(#4, Eduardo Santos).Removed
V.PhyloMaker3fromSuggestsandRemotes— the repo doesn’t exist.vphylomakeraugment backend now prefersV.PhyloMaker2with a fallback toV.PhyloMaker.
Documentation
New cover prose on the README and landing page — names both halves of the prerequisite that
prepR4pcmsolves: reconcile names and retrieve / date trees. New “Quick example — fetching a tree” snippet.-
Three new vignettes (in addition to the meta-analysis one above):
-
“Posterior-tree pipeline (prepR4pcm + pigauto)” — chaining
reconcile_data()→pr_get_tree()→pigauto::multi_impute_trees(). - “Comparing tree backends” — when do they agree?
-
“Assembling mammal trait databases for PCMs” (
db-assembly-workflow_mammals, Santiago Ortega; closes #11) — combining Amniote / PanTHERIA / TetrapodTraits, reconciling, applying manual corrections, producing a tree-aligned species-level data frame.
-
“Posterior-tree pipeline (prepR4pcm + pigauto)” — chaining
?pr_get_treeand?pr_tree_compare@referencesare now spelled out with DOIs (#79, #80, Jimuel Celeste, Jr.). Every author-year citation in the help text has a full reference (Jetz, Rabosky, Upham, Sanchez Reyes, Chang, Michonneau, Kuhner & Felsenstein, Robinson & Foulds).?mammal_tree_examplecites Upham et al. 2019 (#11). The bundled 5,987-tip mammal phylogeny is documented as a subset of the VertLife mammal phylogeny, with the 76X_-prefixed Mesozoic stem-mammal fossils described as products of the Upham et al. “backbone-and-patch” framework.?pr_date_treeclarifiesn_dated > 1semantics — one topology withn_dated = 50returns 50 chronograms sharing the topology but differing in branch lengths (one per DateLife source), not 50 different topologies.getting-started.Rmdmakes explicit thatcol,itis,gbif,ncbi,ott,itis_testare taxadb authority names, not R packages, and thatott(Open Tree Taxonomy) differs fromrotl(the R package that retrieves trees from Open Tree of Life). Expanded mismatch-types section with worked examples for formatting / synonymy / typos.Several rounds of README and vignette feedback from Mal Lagisz (#53, #61, #64) — clearer Features definitions, typical-workflow diagram, Quick example, sentence-level revisions throughout. Detailed feedback from Ayumi Mizuno (#1) seeded the multi-row species and asymmetric datasets vignette subsections.
Logo redesign — tree tips align with matrix rows. Favicons regenerated.
New author: Bhavya Jain added to
Authors@Rand the citation block.
Build, CI, and dependencies
httr2andrgnparseradded toSuggests— for the newauthority = "gnverifier"andparser = "gnparser"backends respectively.CI workflow trimmed to
pull_request+workflow_dispatchonly, Linux-only by default. Theworkflow_dispatchtrigger has anosinput set tofullbefore a release / CRAN submission for the full macOS / Windows / Linux × 3 R-version matrix. Saves ~85% of GitHub Actions minutes.datelifemoved fromSuggests+RemotestoEnhances— datelife was archived from CRAN in 2024 and has a heavy transitive dep tree pak’s resolver can’t install on clean CI. Users opt in withpak::pak("phylotastic/datelife").piggybackallowlisted in the Suggests-vs-Remotes consistency test.dplyr,readr, andstringrmoved fromSuggeststoImports(used routinely in the package).
Tests
-
tests/testthat/test-pr_lookup_gnverifier.R— gnverifier helper (happy-path mix, network-failure degradation,db_versionwarning, mismatched-response shape, session-cache hit, missing-httr2 abort, live integration againstverifier.globalnames.org). -
tests/testthat/test-pr_normalize.R— gnparser backend (mocked + live). - Regression tests for the rtrees placement table, the TNRS metadata columns, the multi-tree reporting fields, the clootl performance fix, the ultrametric-preserving graft, and the matched/unmatched invariants.
- Existing combinatorial test layer (252
test_that()blocks, ~2,737 expectations) carried forward unchanged.
prepR4pcm 0.4.0
This release closes out the multi-round tree-handling overhaul started at issue #42 (Ayumi Mizuno) and tracked at issue #48. Across Rounds 4 - 8 the package gained:
- Five tree-retrieval backends (rotl, rtrees, clootl, fishtree, datelife) unified under
pr_get_tree(..., n_tree, source = "auto"). - Three augmentation backends (rtrees, V.PhyloMaker3/V.PhyloMaker2, U.PhyloMaker) on
reconcile_augment(). - A standalone dating function (
pr_date_tree()) for adding chronogram calibrations to existing topologies. - Per-tree provenance metadata that pairs every tree in a multiPhylo with its citation, calibration method, and tip count – designed for downstream consumption by pigauto
::multi_impute_trees(). -
pr_cite_tree()for formatting citations (text / markdown / bibtex). -
pr_tree_compare()with bipartition-matched branch-length correlation. -
pr_get_tree_status()for backend health probing. - An on-disk cache (
pr_tree_cache_*) for repeat retrievals. - Three vignettes covering the cross-package pipeline, backend comparison, and the original workflows.
Round 8: V.PhyloMaker / U.PhyloMaker + bipartition correlation + ultrametric check
-
reconcile_augment()gains two new sources:-
source = "vphylomaker"– plant-only alternative tortrees, via the GitHub packagesV.PhyloMaker3(preferred) orV.PhyloMaker2(fallback). Both share the samephylo.maker(sp.list, tree, scenarios)API. Use this when you want explicit V.PhyloMaker scenario control (S1 / S2 / S3, see Jin & Qian 2019/2022). -
source = "uphylomaker"– universal (plants + animals) alternative viaU.PhyloMaker(Jin & Qian 2023, Plant Diversity). Samephylo.makerconvention plus agen.listargument; the helper auto-loadsU.PhyloMaker::nodes.info.1when not supplied.
-
-
Bipartition-matched branch-length correlation in
pr_tree_compare(). The previous Pearson-on-sorted-edges approximation is replaced with a proper bipartition match: for each edge in tree A, the corresponding edge in tree B is the one splitting the same set of tips. Edges whose bipartition isn’t shared are dropped from the correlation. -
check_ultrametric = TRUEargument onpr_get_tree(),pr_date_tree(), andreconcile_augment(). After producing the tree, runsape::is.ultrametric()and warns if the backend was expected to produce an ultrametric tree but didn’t. Skipped forrotl(synthesis topology, no real branch lengths) and for the internal augment withbranch_length = "zero"(which breaks ultrametricity by design). Does not modify the tree – to force ultrametricity, callphytools::force.ultrametric()orape::chronos()on the result yourself. -
TimeTree.org REST client deferred as a non-goal. TimeTree’s TOS restricts bulk querying for non-commercial use; a REST wrapper isn’t a clean fit for prepR4pcm’s “install-and-go” backend model. Users who need TimeTree data can fetch it manually and feed the resulting topology into
pr_date_tree()(or skip dating entirely).
Round 7: documentation polish and CI cleanup
-
New vignette: “Comparing tree backends – when do they agree?” Walks through
pr_tree_compare()end-to-end, including how to read the pairwise Jaccard / Robinson-Foulds / branch-length matrices and what to do for each disagreement pattern. Closes the docs gap left by Round 6. -
Remotes:simplification. Daijiang Li merged aRemotes: daijiang/megatreesentry upstream inrtrees(daijiang/rtrees#10), so we no longer need to carry the megatrees Remotes entry ourselves. Dropped fromDESCRIPTIONand the.transitive_remotes_allowlisttest fixture. -
One-shot rotl-missing TNRS warning. Previously the warning fired once per backend call; now it fires once per session via
options(prepR4pcm.tnrs_warning_shown = TRUE). Mainly a fix forsource = "auto", which calls each candidate backend in turn. -
Round 7 edge-case + integration tests. New tests cover corrupt cache files, missing-key reads, cache invalidation by
taxonandn_tree, tree comparisons across exotic shapes (no edge lengths, 3+ trees, named arguments),min_matchboundary values, end-to-end pipelines includingpr_get_tree -> pr_cite_tree, multi-tree provenance preservation, cache save/load round-trip, and theautodispatcher’sauto_attempts/auto_chosemetadata.
Round 6: tree-handling UX polish
-
Local on-disk cache for
pr_get_tree(). Passcache = TRUEto memoise the request on disk; subsequent identical calls are instant. New companion functions:-
pr_tree_cache_dir(path = NULL)– get/set the cache directory. Defaults totools::R_user_dir(); can be set to a project-local path so the cache travels with the analysis. -
pr_tree_cache_status()– list cache entries with timestamps and sizes. -
pr_tree_cache_clear(confirm, source)– wipe the cache; optionally restrict to a single backend.
-
-
pr_get_tree_status(check_network)health probe. One-call report listing every backend with installed / version / needs_network / reachable / install hint. Useful for first-time users figuring out which backends are available. -
pr_tree_compare(...)for comparing trees. Tip-set Jaccard, Robinson-Foulds distance on the shared subtree, branch-length agreement. Acceptsphylo,multiPhylo, orpr_tree_resultinputs (positional or named). -
TNRS preflight for non-TNRS backends. New
tnrsargument onpr_get_tree():"auto"(default; runs TNRS for clootl + fishtree to lift their match rates),"always"(run regardless),"never"(skip). Silently skipped with a one-shot warning whenrotlis not installed. -
source = "auto"fall-through dispatcher. Tries installed backends in priority order; returns the first that resolves at leastmin_matchof the species (default 0.8) or the best of the lot if none meets the threshold. Newmin_matchargument controls the threshold. -
digestadded toSuggests(used by the new cache-key hashing).
Round 5: posterior-tree support and the prepR4pcm -> pigauto pipeline
-
pr_get_tree()gains a unifiedn_treeparameter so users can request a posterior sample (multiPhylo) when a backend supports one. Defaultn_tree = 1L(back-compat). Whenn_tree > 1:-
"rotl"– still returns 1 (the synthesis tree); a one-shot warning explains that no posterior is available. -
"rtrees"–n_treeis informational only; the number of returned trees is fixed by the selected mega-tree and any backend-specific arguments forwarded through.... -
"clootl"–n_tree = 1callsclootl::extractTree();n_tree > 1callsclootl::sampleTrees(count = n_tree)and requires the AvesData repo. -
"fishtree"– switches tofishtree_complete_phylogeny()for a multi-tree stochastic polytomy resolution. -
"datelife"– (new backend; see below) returns one chronogram per database source, capped atn_tree.
-
New backend:
pr_get_tree(source = "datelife"). Universal database of pre-computed chronograms (Sanchez Reyes et al. 2024, Syst. Biol. 73:470). Returns a single SDM-summary chronogram by default;n_tree > 1returns a multiPhylo of per-source candidate chronograms. Backend lives inSuggests+Remotes(datelife was archived from CRAN in 2024 – install withpak::pak("phylotastic/datelife")).New function:
pr_date_tree(tree, n_dated, ...). Wrapsdatelife::datelife_use()to time-calibrate an existing topology. Returns the samepr_tree_resultshape aspr_get_tree()so downstream consumers (notably pigauto) treat retrieved and dated trees interchangeably.New function:
pr_cite_tree(result, format = ...). Formats a citation block (text / markdown / bibtex) for apr_tree_result, including per-tree source citations when the result is a multi-tree posterior. Helpful for paper methods sections, figure legends, and PR descriptions.Per-tree provenance metadata. Every
pr_tree_resultnow carriesbackend_meta$tree_provenance: a list with one entry per returned tree (citation, calibration method, tip count). FormultiPhyloresults,tree[[i]]pairs naturally withbackend_meta$tree_provenance[[i]]. Designed to feedpigauto::multi_impute_trees().Cross-package vignette: “From species names to a phylogenetic posterior – prepR4pcm + pigauto”. End-to-end walkthrough showing how to chain
reconcile_data()->pr_get_tree()(orpr_date_tree()) ->pigauto::multi_impute_trees()-> pooled inference via Rubin’s rules. The vignette uses mock data andeval = FALSEchunks for the parts that need pigauto / datelife installed.Cross-link with
pigauto.?pr_get_tree,?pr_date_tree, and?reconcile_augmentmention pigauto in@seealso. The pkgdown reference index gains a “Sister package: pigauto” callout.
New features
reconcile_augment()gainssource = c("internal", "rtrees")(default"internal", the existing genus-level grafting behaviour) andtaxonarguments. Withsource = "rtrees", grafting is delegated tortrees::get_tree(tree_by_user = TRUE), which uses your tree as the backbone and letsrtrees’ taxon-specific reference tree place each missing species via genus / family information. Helpful when the genus is absent from your tree but present inrtrees’ reference – the internal mode would skip these. Returns the same result shape (withmeta$sourcerecording which backend was used). Refs #42 (Ayumi Mizuno).pr_get_tree()gains a fourth backend,source = "fishtree", exposing the time-calibrated fish-only phylogeny of Rabosky et al. (2018, Nature 559:392) via the CRAN package . Returns a chronogram by default; passtype = "phylogram"for the uncalibrated version. Capturesfishtree’s own warning text intobackend_meta$warningsso the matched/unmatched report is honest.-
pr_get_tree()connects a reconciled species list to an external phylogenetic resource and returns a pruned candidate tree plus a matching report (matched / unmatched / source / backend metadata). Four backends ship:-
"rotl"– Open Tree of Life synthesis tree (universal coverage, via the CRAN package ). -
"rtrees"– taxon-specific mega-trees (bird, mammal, fish, amphibian, reptile, plant, shark/ray, bee, butterfly), via the GitHub package (pak::pak("daijiang/rtrees")). -
"clootl"– bird-only phylogenies in current Clements taxonomy, via the GitHub package (pak::pak("eliotmiller/clootl")). -
"fishtree"– fish-only time-calibrated phylogeny, via the CRAN package .
Accepts a reconciliation object, a character vector, or a data frame as input. Each backend is loaded only on demand – asking for a backend you don’t have installed produces a helpful migration error with the install command. Refs #42 (Ayumi Mizuno).
-
What’s NOT in this round (deferred work)
For honesty / handoff so future contributors don’t lose track:
-
#10 –
reconcile_multi()may undercount dataset-specific matches when names differ only by formatting (Round 4 candidate; needs root-cause debug). -
#12 –
reconcile_summary()prints even when assigned to a variable (Round 4 candidate; small UX fix, needsquiet/print_reportargument). - #14 – Ayumi’s documentation-feedback summary (Round 3).
- #15-#21, #26 – Per-function documentation feedback wave from pooherna and Sergio (Round 3; will be batched into a single doc-quality pass).
- #16 – Suggested Delhey citation correction. Requires verification before changing; current citation may already be correct.
- The remaining 22 of 37 Tier-4 defence-in-depth claim-parity tests outlined in the round-2 plan; they’re documented but not yet written.
-
Round 5 work (separate tracking issue): a unified
n_treeparameter onpr_get_tree()so each backend can return a posterior sample (multiPhylo) when one is available; a newpr_date_tree()function wrappingdatelife::datelife_use()for time-calibrating user topologies; adateliferetrieval backend onpr_get_tree(); per-tree provenance metadata; and a cross-package vignette documenting the prepR4pcm -> pigauto pipeline for posterior-tree PCMs.
New vignette and example data
New vignette “Assembling mammal trait databases for phylogenetic comparative models” (
db-assembly-workflow_mammals), contributed by Santiago Ortega. Walks through combining three mammal trait sources (Amniote, PanTHERIA, TetrapodTraits), reconciling the unique species names against a phylogenetic tree, applying manual corrections, and collapsing the matched records into a model-ready species-level data frame aligned with a pruned tree. Closes #11.Four new bundled example datasets used by the new vignette:
mammal_amniote_example(Myhrvold et al. 2015),mammal_pantheria_example(Jones et al. 2009),mammal_tetrapodtraits_example(Moura et al. 2024), andmammal_tree_example(a 5,987-tip mammal phylogeny, source to be confirmed with the contributor).dplyr,readr, andstringrmoved fromSuggeststoImports, reflecting that they are used routinely in the package’s vignettes and R code rather than being optional.
Breaking changes
-
pr_valid_authorities()no longer listsiucn,tpl,fb,slb, orwd. Empirical testing againsttaxadbv22.12 (the database the package depends on) showed thatiucnerrors with a schema mismatch and the other four are nottaxadbproviders at all. Anyone passing one of these values was getting a hard error from insidetaxadb; the call sites now produce a targeted migration message pointing users at the working authorities. (Identified in follow-up to #5, Ayumi Mizuno.)If you were passing one of the removed authorities, switch to one of
"col","itis","gbif","ncbi", or"ott", or passauthority = NULLto skip synonym resolution.
New features
authority = "ott"(Open Tree of Life) is supported again, after Round 1 dropped it on incomplete diagnosis. The original failure was attaxadb::td_create()with the default schema setc("dwc", "common")—taxadbv22.12 does not ship acommonschema for OTT. We now restrict the schema to"dwc"(the only schema the cascade actually consumes), unblocking OTT and potentially other providers that lack acommonschema. Re-closes #5 properly.authority = "itis_test"exposestaxadb’s small bundled testing dataset. Useful for examples and unit tests without a network round-trip.
Bug fixes
pr_lookup_authority()andpr_ensure_db()no longer print raw{.pkg ...}/{.code ...}cli template strings in their error messages. The functions now route errors throughcli::cli_abort(), which interprets the markup. Closes #4 (Eduardo Santos).The matching cascade no longer prints
Stage 3/4: Synonym resolution (...)whenauthority = NULL, orStage 4/4: Fuzzy matching (...)whenfuzzy = FALSE. Stage numbering is computed from the active stages only, so a call withauthority = NULL, fuzzy = FALSEreportsStage 1/2: Exact ...andStage 2/2: Normalised .... Previously, the fixedStage X/4labels suggested matches were being made at synonym/fuzzy stages even when they were skipped. Closes #13 (Ayumi Mizuno).reconcile_tree()andreconcile_data()previously dropped manual overrides silently when the overridename_xwas not in the data orname_ywas not in the target. The reconciliation object now carries anunused_overridesslot listing every rejected override with areason(name_x_not_in_data,name_y_not_in_target, oralready_matched), and the functions emit acli_alert_warning()pointing the user at it.reconcile_summary()includes a count and a per-row listing in the verbose section. Closes #8a (Ayumi Mizuno).
New features
-
reconcile_crosswalk()now accepts.csv,.tsv, or.txt(tab-delimited) file paths in addition to data frames. The format is inferred from the file extension. Closes #8b (Ayumi Mizuno).
Documentation
Installation instructions are standardised on
pak::pak(...)across the README and the Getting started vignette. Closes #6 (Ayumi Mizuno).The
@param authorityblock inreconcile_tree()/reconcile_data()now reflects which authorities are actually supported.tpl,slb,wd,iucn,fbare flagged as experimental (“coverage and current availability vary”);ottis documented as not supported in the current defaulttaxadbrelease.The
bird-workflowvignette now guards itscaperandMCMCglmmchunks witheval = requireNamespace(..., quietly = TRUE), so the vignette knits cleanly for users (and CRAN check environments) without those Suggests packages installed.Added a hex sticker logo (
man/figures/logo.svg/logo.png) to the README and the pkgdown site.pkgdown site rebuilt to fix stale search-index links that pointed to 404 pages for
reconcile_override_batchandreconcile_suggest. Closes #7 (Ayumi Mizuno).
Breaking changes
- The first argument of 13 exported functions has been renamed from
xtoreconciliation:reconcile_apply(),reconcile_augment(),reconcile_export(),reconcile_mapping(),reconcile_merge(),reconcile_override(),reconcile_override_batch(),reconcile_plot(),reconcile_report(),reconcile_review(),reconcile_splits_lumps(),reconcile_suggest(), andreconcile_summary(). This fixes #3 (Santiago Ortega), wherereconcile_apply(result = res_tree, ...)raised an “unused argument” error because the parameter was namedx. Positional calls (reconcile_apply(res_tree, ...)) continue to work unchanged; only code that passed the reconciliation asx = ...needs updating toreconciliation = ....reconcile_diff(x, y)is intentionally unchanged — both arguments are reconciliation objects in a symmetric comparison, so neither is the “reconciliation”.
Documentation
- New vignette subsections on multi-row species and asymmetric datasets in the bird workflow vignette, addressing #1 (Ayumi Mizuno). Shows how to aggregate to species level before merging, how to join the mapping back to a full multi-row dataset, and when to pick
how = "inner"vshow = "left"for focal study × reference database merges. -
reconcile_merge()help page now carries the same guidance in a@detailsblock, covering pairwise row expansion warnings and the four join types. - Rewrote every exported function’s help page for an ecologist / evolutionary biologist audience (the primary users running PCM, PGLS, and PGLMM analyses). The previous pages read like API reference; the new pages explain the reconciliation workflow, taxadb authority choices, fuzzy matching semantics, and tree augmentation trade-offs in terms the audience actually uses.
-
?prepR4pcmis now a proper package landing page with a canonical workflow code block, a “Key concepts” section (reconciliation object, four-stage cascade, provenance, splits/lumps, augmentation), and function-family pointers. - Each taxadb
authorityoption inreconcile_data()andreconcile_tree()is now glossed in one line ("col"= Catalogue of Life,"gbif"= GBIF Backbone, etc.) to help users choose without consulting the taxadb manual. -
reconcile_augment()gained a “When to use this” section with explicit cautions about reporting augmented tips, running sensitivity analyses, and preferring PhyloMaker / TACT for publication-grade augmentation. -
reconcile_suggest()now explains Levenshtein similarity and the 60/40 genus/epithet weighting in plain language. -
reconcile_mapping()documents every column of the returned tibble, including whenname_resolvedisNAand the fullmatch_typevocabulary. - New help page for the
reconciliationS3 class documenting its four list components (mapping,meta,counts,overrides) and S3 methods. This also clears previous R CMD check “Missing link(s)” warnings from cross-references. - Reorganised the pkgdown reference index into seven task-oriented groups (Match species names / Inspect and audit / Corrections and crosswalks / Apply, merge, export / Augment phylogenies / Name utilities / Bundled example data), each with a short descriptive line.
Tests
- Added a combinatorial test layer that stresses parameter combinations rather than single cases. Three new test files (
test-authority-mocked.R,test-workflows.R,test-robustness.R) and parametric grid extensions to nine existing files take the suite from ~311 expectations to 1,868 expectations across 252test_that()blocks (0 failures). - Every historical bug that has shipped — #495 cartesian merge explosion, the
drop_unresolvedno-op, the diacritics regex failure, factor coercion, silent multi-phylo handling — lived in parameter combinations that single-axis tests never exercised. The new layer tests combinations and asserts invariants (row counts, NA counts, tree tip counts, set membership, S3 class, idempotence). -
test-authority-mocked.Rstubspr_lookup_authority()vialocal_mocked_bindings()so the synonym-resolution branch is exercised without hitting taxadb or the network, covering accepted→synonym, synonym→accepted, neither-found, and network-error scenarios forcol/itis/gbif/ncbi. -
test-workflows.Rchains functions end-to-end the way real users do, including the #495 asymmetric pattern (750 shared / 96 only_x / 10,400 only_y). -
test-robustness.Rcovers adversarial inputs: empty data, all-NAspecies columns, factor columns, Unicode (diacritics and Japanese kana), minimal 1-row/1-tip cases, large-input smoke tests, invalid types, and malformed arguments.
prepR4pcm 0.3.0
New features
-
reconcile_report()generates a self-contained HTML report documenting all name-matching decisions — suitable for sharing or archiving. -
reconcile_merge()joins two reconciled datasets into a single analysis-ready data frame using the mapping table fromreconcile_data(). -
reconcile_augment()grafts unresolved species onto a tree using genus-level placement (sister to congener or MRCA of congeners). -
reconcile_splits_lumps()detects taxonomic splits and lumps from synonym resolution results. - Fuzzy matching via
fuzzy = TRUEcatches likely typos using component-based Levenshtein similarity. -
resolve = "flag"marks low-confidence matches for manual review. -
reconcile_plot()visualises match composition as a bar chart or pie chart using base R graphics. -
reconcile_suggest()shows the closest fuzzy candidates for each unresolved species — useful for finding near-misses. -
reconcile_diff()compares two reconciliation objects and reports gained/lost matches, type changes, and target changes. -
reconcile_override_batch()applies multiple overrides at once from a data frame or CSV file. -
reconcile_review()provides an interactive console interface for accepting or rejecting flagged and fuzzy matches one at a time. -
print.reconciliation()now shows a coverage bar:[████████████████████░░░░░░░░░░] 71% (657/919). - Stage-level progress messages for large datasets (> 500 species) in the matching cascade.
Data
- Example datasets expanded from ~200 to ~920 species (Corvoidea + allied passerine families) for realistic demonstrations.
- All
\dontrun{}examples replaced with runnable examples using bundled data.
Performance
- Fuzzy matching (
pr_fuzzy_match()) now uses genus pre-filtering: only species whose genus is within 2 edits are compared. This reduces computation from O(n×m) to roughly O(n×k) where k is the number of congeners+near-genera, giving ~100× speedup on large datasets (e.g., 1360×6504 from >10 min to ~3 sec). -
reconcile_suggest()uses the same genus pre-filter and vectorisedadist(), making it usable for 1000+ unresolved species.
Bug fixes
- Fixed crosswalk overrides having no effect. The cascade’s override pre-stage required exact string matches between override names and input names. When override names used spaces but tree tips used underscores (or vice versa), no overrides were applied. Overrides now normalise both sides before comparison.
- Fixed
reconcile_plot()error when passingmainargument: the internalpr_plot_bar()hardcodedmainand also passed..., causing a duplicate argument error. - Fixed
.Rbuildignorepatterns that excluded.rdadata files on case-insensitive filesystems.
prepR4pcm 0.2.0
- Core reconciliation engine: exact → normalised → synonym → fuzzy cascade.
-
reconcile_tree(),reconcile_data(),reconcile_trees(). -
reconcile_apply(),reconcile_export(),reconcile_override(). -
reconcile_to_trees(),reconcile_multi(),reconcile_crosswalk(). -
reconcile_summary(),reconcile_mapping(). - Bundled datasets: avonet_subset, nesttrait_subset, delhey_subset, crosswalk_birdlife_birdtree, tree_jetz, tree_clements25.
- Bird workflow vignette.