Overview

jabot logo watermark

What is barRoso?

barRoso is an R package designed to clean, standardize, and reconcile plant specimen records, especially those retrieved from herbarium databases and biodiversity repositories. Originally developed for GBIF, it is compatible with data from JABOT, SEINet, REFLORA, speciesLink, and others.

The package is named after Brazilian botanist Graziela Maciel Barroso, and also serves as an acronym: Biodiversity Analysis and Record Reconciliation for Organizing Specimen Observations.

What does barRoso do?

barRoso helps users:

  • Standardize collector names and collection numbers using regex-based parsing
  • Harmonize taxonomic, geographic, and temporal fields
  • Flag and remove potential duplicates across herbarium records
  • Generate herbarium labels from fieldbook data
  • Integrate with external taxonomic databases (e.g., LCVP, WFO)
  • Prepare large-scale biodiversity datasets for publication and analysis

Who should use barRoso?

  • Taxonomists and herbarium curators
  • Biodiversity data scientists
  • Floristic inventory teams
  • Graduate students and researchers managing field collections

What makes barRoso different?

Unlike other packages that rely on static collector name dictionaries, barRoso uses robust, extensible regular expressions to detect name patterns and correct inconsistencies across data sources. It also includes modular, tidyverse-compatible functions for a reproducible and efficient workflow.

Data Preservation Philosophy

While many data tools prioritize aggressive cleaning, often at the cost of discarding valuable records, barRoso takes a different approach. Its philosophy centers on standardization rather than removal. All herbarium specimens carry potential scientific value, even when incomplete or inconsistently entered. Instead of omitting such records, barRoso focuses on harmonizing fields to enhance comparability across collections. By standardizing collector names, geographic fields, and taxonomic labels, barRoso allows users to flag rather than erase inconsistencies, enabling more transparent workflows and tracing potential misidentifications, especially across distributed duplicates. This inclusive approach honors the archival role of herbaria while facilitating reproducible biodiversity research.

Learn more

Explore the full documentation: