barroso_cat

Combine and Harmonize Multiple Herbarium Data Sources
barRoso::barroso_cat()

Description

Merges herbarium records from two or more biodiversity data sources into a single harmonized data frame. Optionally prioritizes specific sources when duplicates are detected across herbaria, retaining records based on a flexible exclusion strategy. The function keeps non-Brazilian herbaria records by default, assuming higher completeness from global repositories.

Details

This function aligns column structures, removes redundant records from overlapping herbaria, and merges all sources into a single output. Duplicate filtering is based on matching collectionCode across sources. Users can specify a preferred source (keep_source) when duplicates exist.

Arguments

Argument Description
list_sources A named list of data frames. Each element represents a herbarium data source. The names of the list are used to track the source origin for internal filtering.
keep_source Optional character string specifying the preferred data source (e.g., “GBIF”) for resolving duplicate collectionCode conflicts. If NULL, all records are retained.

Value

A harmonized data frame combining all provided herbarium sources, with columns aligned and optionally filtered to resolve duplicate collections.

Examples

combined_df <- barroso_cat(list_sources = list(GBIF = gbif_data,
                                               speciesLink = splink_data,
                                               JABOT = jabot_data),
                                keep_source = "GBIF")