Flexible Column Names

Overview

Herbarium datasets vary widely in how they name their columns. One dataset might label a column as estado, while another might use stateProvince or province. To make barRoso work with any data source, all standardization functions support flexible column naming.

The barroso_std() function accepts parameters such as colname_country, colname_stateProvince, colname_recordedBy, and more. These let you specify the actual column names in your input data.

This flexibility is essential when cleaning datasets from mixed sources like GBIF, speciesLink, JABOT, or SEINet.

Specifying Custom Column Names

Use named arguments in barroso_std() to point to your actual column names. For example:

library(barRoso)

# Assume your dataset uses "pais" and "estado" instead of "country" and "stateProvince"
df <- read.csv("raw_data.csv")

df_std <- barroso_std(df,
                      colname_country = "pais",
                      colname_stateProvince = "estado",
                      colname_recordedBy = "coletor",
                      colname_recordNumber = "numero",
                      colname_locality = "localidade",
                      colname_family = "familia",
                      colname_genus = "genero",
                      colname_specificEpithet = "epiteto",
                      rm_duplicates = TRUE)

Why This Matters

  • Ensures compatibility with multilingual datasets
  • Prevents failure due to missing or misnamed columns
  • Helps merge and clean records from disparate herbaria

Recommendations

  • Always check the column headers of your CSV or data frame before calling barroso_std()
  • Rename fields only if necessary — using this feature gives you more flexibility

This makes barRoso highly generalizable for data standardization across any herbarium or biodiversity dataset.