Skip to contents

For every species that the four-stage cascade failed to resolve, reconcile_suggest() returns the top-n candidate matches in the reference source (y). The cascade is the exact -> normalised -> synonym -> fuzzy matching process run by reconcile_tree() and reconcile_data() (see ?prepR4pcm). This is the most efficient way to audit orphan species: a typo or a species epithet that drifted by one letter will usually appear near the top of the list, and you can then feed the fix to reconcile_override() or reconcile_override_batch().

Usage

reconcile_suggest(reconciliation, n = 3, threshold = 0.7, quiet = FALSE)

Arguments

reconciliation

A reconciliation object returned by reconcile_tree(), reconcile_data(), or a related matcher.

n

Integer. Maximum number of suggestions to return per unresolved species. Default 3.

threshold

Numeric in [0, 1]. Minimum weighted similarity score for a candidate to be listed. Default 0.7 (quite permissive, because the idea is to surface candidates for review). Raise to 0.85 for a tighter shortlist.

quiet

Logical. Suppresses informational messages when TRUE. Default FALSE.

Value

A tibble with one row per (unresolved, suggestion) pair:

unresolved

The unresolved name from source x.

suggestion

A candidate name from source y.

score

Weighted similarity in [threshold, 1].

Rows are sorted by unresolved then descending score, so the first suggestion for each name is the best candidate.

Details

Similarity is computed from the Levenshtein edit distance between normalised names — i.e., the minimum number of character insertions, deletions and substitutions needed to turn one name into the other, divided by the length of the longer name and subtracted from 1. The final score is weighted 60% genus, 40% specific epithet, which heavily penalises genus-level disagreement while tolerating small epithet differences.

For computational efficiency on large trees, reconcile_suggest() only compares a query name against reference names whose genus is within 2 character edits of the query genus. This can very occasionally miss a match where both the genus and the epithet are badly misspelled simultaneously; if you suspect that, lower the threshold and inspect manually.

Examples

data(avonet_subset)
data(tree_jetz)
rec <- reconcile_tree(avonet_subset, tree_jetz,
                      x_species = "Species1", authority = NULL)
#>  Reconciling 919 data names vs 657 tree tips
#>  Matching 919 x 657 names through 2 stages...
#>  Stage 1/2: Exact matching...
#>  Stage 2/2: Normalised matching (0 matched so far)...
#>  Matched 657/919 data names to tree tips

suggestions <- reconcile_suggest(rec, n = 2, threshold = 0.85)
#>  Found suggestions for 6 of 262 unresolved species.
head(suggestions, 10)
#> # A tibble: 7 × 3
#>   unresolved            suggestion            score
#>   <chr>                 <chr>                 <dbl>
#> 1 Coracina panayensis   Coracina_papuensis    0.88 
#> 2 Lalage leucoptera     Lalage_leucomela      0.88 
#> 3 Lalage leucoptera     Lalage_leucopyga      0.88 
#> 4 Lalage melanoptera    Lalage_melanoleuca    0.854
#> 5 Lanius borealis       Lanius_dorsalis       0.9  
#> 6 Myiagra cervinicolor  Myiagra_cervinicauda  0.867
#> 7 Pericrocotus montanus Pericrocotus_miniatus 0.85