Frequently Asked Questions
What is the barRoso?
barRoso
is an R package for cleaning, standardizing, and reconciling plant specimen records—especially from biodiversity repositories like GBIF. It streamlines workflows for curators, researchers, and data scientists by automating the harmonization of taxonomic, geographic, and collector metadata. Optimized for reproducibility, barRoso
facilitates the detection of duplicate specimens and improves data quality across herbarium datasets.
What can I use barRoso for?
- Standardize collector names and associated collector numbers — a crucial step for detecting duplicate specimens across herbaria
- Harmonize taxonomic and geographic fields (e.g., family, genus, locality, country)
- Detect and flag potential duplicate records using cleaned metadata
- Generate herbarium labels from field notebooks, including taxon authority and geolocation maps
- Prepare high-quality datasets for biodiversity, floristic, and taxonomic studies
How do I install barRoso?
Follow the Get Started guide for full instructions on installing the package and its dependencies via GitHub.
Is barRoso free to use?
Yes. barRoso
is open-source software licensed under the MIT License, which permits unrestricted use, modification, and distribution.
What output can barRoso generate?
While barRoso
does not directly produce formatted reports, it supports:
- Cleaned and standardized data frames for downstream biodiversity analysis
- Visual maps in herbarium label generation
- Easy integration with RMarkdown, Quarto, and Shiny for reporting and dashboards
How does barRoso work?
Unlike many packages that rely on static dictionaries to clean collector names, barRoso
uses a robust set of regular expressions (regex) to dynamically identify and standardize collector patterns across datasets. This approach allows barRoso
to generalize better across sources and spelling variations — even when names are inconsistently formatted. Combined with harmonization of geographic and taxonomic fields, this makes the package especially powerful for detecting duplicates and preparing data for downstream analysis.
Why doesn’t barRoso
remove incomplete or inconsistent records?
Because standardization is more powerful than deletion. Many biodiversity records contain valuable metadata even when some fields are missing or messy. barRoso
standardizes and flags potential problems but leaves the final decision to the user—preserving data integrity and maximizing research potential.
Who are the developers of barRoso?
Development is sponsored by JBRJ, FINEP, and CNPq. Main developer:
- Domingos Cardoso (@DBOSlab)
View the code and contribute on GitHub: https://github.com/DBOSlab/barRoso
Why the name barRoso?
The name honors Graziela Maciel Barroso (1912–2003), a pioneering Brazilian botanist. It also serves as an acronym: Biodiversity Analysis and Record Reconciliation for Organizing Specimen Observations, highlighting both its mission and its deep roots in the R programming ecosystem.
Where can I report bugs or request features?
Submit issues on GitHub: barRoso Issues.
Where can I ask questions or join the community?
Visit GitHub Discussions to ask questions, share ideas, or connect with other users.