For every species that the four-stage cascade failed to resolve,
reconcile_suggest() returns the top-n candidate matches in the
reference source (y). The cascade is the exact -> normalised ->
synonym -> fuzzy matching process run by reconcile_tree() and
reconcile_data() (see ?prepR4pcm). This is the most efficient
way to audit orphan species: a typo or a species epithet that
drifted by one letter will usually appear near the top of the list,
and you can then feed the fix to reconcile_override() or
reconcile_override_batch().
Arguments
- reconciliation
A reconciliation object returned by
reconcile_tree(),reconcile_data(), or a related matcher.- n
Integer. Maximum number of suggestions to return per unresolved species. Default
3.- threshold
Numeric in [0, 1]. Minimum weighted similarity score for a candidate to be listed. Default
0.7(quite permissive, because the idea is to surface candidates for review). Raise to0.85for a tighter shortlist.- quiet
Logical. Suppresses informational messages when
TRUE. DefaultFALSE.
Value
A tibble with one row per (unresolved, suggestion) pair:
unresolvedThe unresolved name from source
x.suggestionA candidate name from source
y.scoreWeighted similarity in [
threshold, 1].
Rows are sorted by unresolved then descending score, so the
first suggestion for each name is the best candidate.
Details
Similarity is computed from the Levenshtein edit distance between normalised names — i.e., the minimum number of character insertions, deletions and substitutions needed to turn one name into the other, divided by the length of the longer name and subtracted from 1. The final score is weighted 60% genus, 40% specific epithet, which heavily penalises genus-level disagreement while tolerating small epithet differences.
For computational efficiency on large trees, reconcile_suggest()
only compares a query name against reference names whose genus is
within 2 character edits of the query genus. This can very
occasionally miss a match where both the genus and the epithet are
badly misspelled simultaneously; if you suspect that, lower the
threshold and inspect manually.
See also
reconcile_override() / reconcile_override_batch() to
act on suggestions; reconcile_review() for an interactive
alternative.
Other reconciliation functions:
reconcile_apply(),
reconcile_augment(),
reconcile_crosswalk(),
reconcile_data(),
reconcile_diff(),
reconcile_export(),
reconcile_mapping(),
reconcile_merge(),
reconcile_multi(),
reconcile_override(),
reconcile_override_batch(),
reconcile_plot(),
reconcile_report(),
reconcile_review(),
reconcile_splits_lumps(),
reconcile_summary(),
reconcile_to_trees(),
reconcile_tree(),
reconcile_trees()
Examples
data(avonet_subset)
data(tree_jetz)
rec <- reconcile_tree(avonet_subset, tree_jetz,
x_species = "Species1", authority = NULL)
#> ℹ Reconciling 919 data names vs 657 tree tips
#> ℹ Matching 919 x 657 names through 2 stages...
#> ℹ Stage 1/2: Exact matching...
#> ℹ Stage 2/2: Normalised matching (0 matched so far)...
#> ✔ Matched 657/919 data names to tree tips
suggestions <- reconcile_suggest(rec, n = 2, threshold = 0.85)
#> ✔ Found suggestions for 6 of 262 unresolved species.
head(suggestions, 10)
#> # A tibble: 7 × 3
#> unresolved suggestion score
#> <chr> <chr> <dbl>
#> 1 Coracina panayensis Coracina_papuensis 0.88
#> 2 Lalage leucoptera Lalage_leucomela 0.88
#> 3 Lalage leucoptera Lalage_leucopyga 0.88
#> 4 Lalage melanoptera Lalage_melanoleuca 0.854
#> 5 Lanius borealis Lanius_dorsalis 0.9
#> 6 Myiagra cervinicolor Myiagra_cervinicauda 0.867
#> 7 Pericrocotus montanus Pericrocotus_miniatus 0.85