Fuzzy-match two sets of species names — pr_fuzzy

Uses component-based similarity: the genus and epithet are matched separately, then combined with weights (genus 0.6, epithet 0.4) to reflect that genus-level errors are more informative. Uses base R utils::adist() for Levenshtein distance — no extra dependencies.

Usage

pr_fuzzy_match(names_x, names_y, threshold = 0.9, rank = "species")

Arguments

names_x: Character vector.
names_y: Character vector.
threshold: Numeric (0–1). Minimum similarity score. Default 0.9.
rank: Character. "species" or "subspecies".

Value

A tibble with columns: name_x, name_y, score, notes.

Details

Genus pre-filtering is applied: only names whose genus is within 2 edits of each other are compared. This reduces the number of pairwise comparisons dramatically for large datasets.