Flexible Column Names
Overview
Herbarium datasets vary widely in how they name their columns. One dataset might label a column as estado
, while another might use stateProvince
or province
. To make barRoso
work with any data source, all standardization functions support flexible column naming.
The barroso_std()
function accepts parameters such as colname_country
, colname_stateProvince
, colname_recordedBy
, and more. These let you specify the actual column names in your input data.
This flexibility is essential when cleaning datasets from mixed sources like GBIF, speciesLink, JABOT, or SEINet.
Specifying Custom Column Names
Use named arguments in barroso_std()
to point to your actual column names. For example:
library(barRoso)
# Assume your dataset uses "pais" and "estado" instead of "country" and "stateProvince"
<- read.csv("raw_data.csv")
df
<- barroso_std(df,
df_std colname_country = "pais",
colname_stateProvince = "estado",
colname_recordedBy = "coletor",
colname_recordNumber = "numero",
colname_locality = "localidade",
colname_family = "familia",
colname_genus = "genero",
colname_specificEpithet = "epiteto",
rm_duplicates = TRUE)
Why This Matters
- Ensures compatibility with multilingual datasets
- Prevents failure due to missing or misnamed columns
- Helps merge and clean records from disparate herbaria
Recommendations
- Always check the column headers of your CSV or data frame before calling
barroso_std()
- Rename fields only if necessary — using this feature gives you more flexibility
This makes barRoso
highly generalizable for data standardization across any herbarium or biodiversity dataset.