::std_collection() barRoso
std_collection
Description
Cleans and standardizes herbarium acronyms in biodiversity datasets by harmonizing values from the collectionCode
and institutionCode
fields. The function corrects common issues in GBIF and other aggregated records, replacing ambiguous or placeholder codes with recognized herbarium acronyms. It also flags missing values with fallback rules and optional original column retention.
Details
This function is part of the barRoso
package, and applies a large set of conditional replacements based on known patterns and integrates fallback from institutionCode
when collectionCode
is missing or ambiguous. Common aliases like "Herbarium"
, "Botany"
, or "Angiosperms"
are converted to valid acronyms when possible.
Arguments
Argument | Description |
---|---|
df | A data frame with biodiversity specimen records. |
colname_collectionCode | Name of the column containing collection codes (default: "collectionCode" ). |
colname_institutionCode | Name of the column containing institution codes (default: "institutionCode" ). |
rm_original_column | Logical; if TRUE , original columns are removed after cleaning (default: TRUE ). |
Value
A data frame with standardized collection codes in the collectionCode
column. If rm_original_column = FALSE
, the original values are saved with a *Original suffix.
Examples
<- read.csv("gbif_download.csv")
df <- std_collection(df,
df_clean colname_collectionCode = "collection_code",
colname_institutionCode = "institution_code",
rm_original_column = FALSE)