barRoso::std_collection()std_collection
Description
Cleans and standardizes herbarium acronyms in biodiversity datasets by harmonizing values from the collectionCode and institutionCode fields. The function corrects common issues in GBIF and other aggregated records, replacing ambiguous or placeholder codes with recognized herbarium acronyms. It also flags missing values with fallback rules and optional original column retention.
Details
This function is part of the barRoso package, and applies a large set of conditional replacements based on known patterns and integrates fallback from institutionCode when collectionCode is missing or ambiguous. Common aliases like "Herbarium", "Botany", or "Angiosperms" are converted to valid acronyms when possible.
Arguments
| Argument | Description |
|---|---|
| df | A data frame with biodiversity specimen records. |
| colname_collectionCode | Name of the column containing collection codes (default: "collectionCode"). |
| colname_institutionCode | Name of the column containing institution codes (default: "institutionCode"). |
| rm_original_column | Logical; if TRUE, original columns are removed after cleaning (default: TRUE). |
Value
A data frame with standardized collection codes in the collectionCode column. If rm_original_column = FALSE, the original values are saved with a *Original suffix.
Examples
df <- read.csv("gbif_download.csv")
df_clean <- std_collection(df,
colname_collectionCode = "collection_code",
colname_institutionCode = "institution_code",
rm_original_column = FALSE)