std_collection

Standardize Herbarium Acronyms in Collection Records
barRoso::std_collection()

Description

Cleans and standardizes herbarium acronyms in biodiversity datasets by harmonizing values from the collectionCode and institutionCode fields. The function corrects common issues in GBIF and other aggregated records, replacing ambiguous or placeholder codes with recognized herbarium acronyms. It also flags missing values with fallback rules and optional original column retention.

Details

This function is part of the barRoso package, and applies a large set of conditional replacements based on known patterns and integrates fallback from institutionCode when collectionCode is missing or ambiguous. Common aliases like "Herbarium", "Botany", or "Angiosperms" are converted to valid acronyms when possible.

Arguments

Argument Description
df A data frame with biodiversity specimen records.
colname_collectionCode Name of the column containing collection codes (default: "collectionCode").
colname_institutionCode Name of the column containing institution codes (default: "institutionCode").
rm_original_column Logical; if TRUE, original columns are removed after cleaning (default: TRUE).

Value

A data frame with standardized collection codes in the collectionCode column. If rm_original_column = FALSE, the original values are saved with a *Original suffix.

Examples

df <- read.csv("gbif_download.csv")
df_clean <- std_collection(df,
                           colname_collectionCode = "collection_code",
                           colname_institutionCode = "institution_code",
                           rm_original_column = FALSE)