Uses component-based similarity: the genus and epithet are matched
separately, then combined with weights (genus 0.6, epithet 0.4) to
reflect that genus-level errors are more informative. Uses base R
utils::adist() for Levenshtein distance — no extra dependencies.
Usage
pr_fuzzy_match(names_x, names_y, threshold = 0.9, rank = "species")
Arguments
- names_x
Character vector.
- names_y
Character vector.
- threshold
Numeric (0–1). Minimum similarity score. Default 0.9.
- rank
Character. "species" or "subspecies".
Value
A tibble with columns: name_x, name_y, score, notes.
Details
Genus pre-filtering is applied: only names whose genus is within 2
edits of each other are compared. This reduces the number of pairwise
comparisons dramatically for large datasets.