Overview
What is barRoso?
barRoso
is an R package designed to clean, standardize, and reconcile plant specimen records, especially those retrieved from herbarium databases and biodiversity repositories. Originally developed for GBIF, it is compatible with data from JABOT, SEINet, REFLORA, speciesLink, and others.
The package is named after Brazilian botanist Graziela Maciel Barroso, and also serves as an acronym: Biodiversity Analysis and Record Reconciliation for Organizing Specimen Observations.
What does barRoso do?
barRoso
helps users:
- Standardize collector names and collection numbers using regex-based parsing
- Harmonize taxonomic, geographic, and temporal fields
- Flag and remove potential duplicates across herbarium records
- Generate herbarium labels from fieldbook data
- Integrate with external taxonomic databases (e.g., LCVP, WFO)
- Prepare large-scale biodiversity datasets for publication and analysis
Who should use barRoso?
- Taxonomists and herbarium curators
- Biodiversity data scientists
- Floristic inventory teams
- Graduate students and researchers managing field collections
What makes barRoso different?
Unlike other packages that rely on static collector name dictionaries, barRoso
uses robust, extensible regular expressions to detect name patterns and correct inconsistencies across data sources. It also includes modular, tidyverse-compatible functions for a reproducible and efficient workflow.
Data Preservation Philosophy
While many data tools prioritize aggressive cleaning, often at the cost of discarding valuable records, barRoso
takes a different approach. Its philosophy centers on standardization rather than removal. All herbarium specimens carry potential scientific value, even when incomplete or inconsistently entered. Instead of omitting such records, barRoso
focuses on harmonizing fields to enhance comparability across collections. By standardizing collector names, geographic fields, and taxonomic labels, barRoso
allows users to flag rather than erase inconsistencies, enabling more transparent workflows and tracing potential misidentifications, especially across distributed duplicates. This inclusive approach honors the archival role of herbaria while facilitating reproducible biodiversity research.
Learn more
Explore the full documentation: