barroso_std

Standardize Biodiversity Records Across Multiple Fields
barRoso::barroso_std()

Description

A wrapper function that performs integrated cleaning and standardization of biodiversity collection records using the barRoso package. This includes harmonizing taxonomic, geographic, collector, and type status information, as well as flagging or removing unvouchered and duplicate specimens.

Details

This function orchestrates several std_* functions from the barRoso package to clean records from virtual herbaria and biodiversity portals. It handles multilingual field names, missing data, inconsistent formatting, and dataset chunking for large inputs. The function also detects and optionally removes duplicate records and specimens lacking voucher information.

Arguments

Argument Description
Input data frame containing raw biodiversity records.
unvouchered Logical; if TRUE, remove unvouchered wood/seed/spirit specimens (default: TRUE).
delunkcoll Logical; if TRUE, removes records with unknown collectors (default: FALSE).
flag_missid Reserved for future use. Currently not implemented.
flag_duplicates Logical; if TRUE, flags duplicates with a logical column (default: TRUE).
rm_duplicates Logical; if TRUE, removes duplicate specimens (default: FALSE).
colname_recordedBy Column name for collector names.
colname_recordNumber Column name for collector number.
colname_continent Column name for continent.
colname_country Column name for country.
colname_stateProvince Column name for state/province.
colname_county Column name for county.
colname_municipality Column name for municipality.
colname_locality Column name for locality.
colname_collectionCode Column name for collection code.
colname_institutionCode Column name for institution code.
colname_typeStatus Column name for type status.
colname_family Column name for family.
colname_genus Column name for genus.
colname_specificEpithet Column name for specific epithet.
rm_original_column Logical; if TRUE, remove original columns after standardization.

Value

A fully cleaned and standardized data frame ready for downstream reconciliation, duplicate handling, and label generation.

Examples

df <- read.csv("raw_herbarium_data.csv")
df_std <- barroso_std(df,
                      colname_country = "pais",
                      colname_stateProvince = "estado",
                      rm_duplicates = TRUE)