Merge two reconciled datasets

After reconciling two datasets with reconcile_data(), use this function to join them into a single analysis-ready data frame. The reconciliation mapping table provides the species-level join key, so names that differ between the two datasets (due to formatting, synonyms, or typos) are correctly linked.

Usage

reconcile_merge(
  reconciliation,
  data_x,
  data_y,
  species_col_x = NULL,
  species_col_y = NULL,
  how = c("inner", "left", "full"),
  suffix = c("_x", "_y"),
  drop_unresolved = FALSE
)

Arguments

reconciliation

A reconciliation object (typically from reconcile_data()).

data_x

The first data frame (source x in the reconciliation).

data_y

The second data frame (source y in the reconciliation).

species_col_x

A length-1 character vector. Species column in data_x. Auto-detected if NULL.

species_col_y

A length-1 character vector. Species column in data_y. Auto-detected if NULL.

how

A length-1 character vector. Join type:

"inner" (default): keep only species matched in both datasets.
"left": keep all species from data_x.
"full": keep all species from both datasets.

suffix

A length-2 character vector. Suffixes to disambiguate columns with the same name in both datasets. Default c("_x", "_y").

drop_unresolved

Logical. If TRUE, rows where species_resolved is NA (i.e., species that could not be reconciled) are removed from the final result. Default FALSE (keep all rows, fill unmatched columns with NA). Only relevant for how = "left" or how = "full"; inner joins drop unmatched rows by definition.

Value

A data frame with a species_resolved column as the join key, plus all columns from both datasets (with suffixes added when column names collide).

Details

One row per species. reconcile_merge() works best when each dataset has exactly one row per species. If a species appears in multiple rows (e.g., sex-specific measurements, repeated populations), the merge produces all pairwise combinations for that species—the same behaviour as base merge(). To avoid unexpected row expansion, aggregate to one row per species before merging, or be aware that the output will contain more rows than either input.

Asymmetric datasets. When data_y contains many more species than data_x (common when merging against a large reference database), use how = "inner" or how = "left". Inner joins keep only the species present in both datasets; left joins keep all data_x rows and fill data_y columns with NA for unmatched species. Use how = "full" only when you need to retain species unique to either side.

Recommended workflow for multi-row data. Reconcile using a species-level summary (one row per species), inspect the mapping with reconcile_mapping(), then join the mapping back to your full dataset using the species column as key.

Examples

data(avonet_subset)
data(nesttrait_subset)

rec <- reconcile_data(avonet_subset, nesttrait_subset,
                      x_species = "Species1",
                      y_species = "Scientific_name",
                      authority = NULL, quiet = TRUE)

merged <- reconcile_merge(rec, avonet_subset, nesttrait_subset,
                          species_col_x = "Species1",
                          species_col_y = "Scientific_name")
#> ✔ Merged 916 species (inner join)
cat(sprintf("Merged: %d rows, %d cols\n", nrow(merged), ncol(merged)))
#> Merged: 916 rows, 31 cols
head(merged[, c("species_resolved", "Family1", "Common_name")])
#>           species_resolved      Family1              Common_name
#> 1 Acanthagenys rufogularis Meliphagidae Spiny-cheeked Honeyeater
#> 2       Acanthiza apicalis Acanthizidae         Inland Thornbill
#> 3    Acanthiza chrysorrhoa Acanthizidae  Yellow-rumped Thornbill
#> 4        Acanthiza cinerea Acanthizidae           Grey Thornbill
#> 5        Acanthiza ewingii Acanthizidae      Tasmanian Thornbill
#> 6       Acanthiza inornata Acanthizidae        Western Thornbill

Usage

Arguments

Value

Details

See also

Examples