REFLORA Cleaning Workflow

This article walks through a practical case study using the barRoso package to clean and harmonize plant specimen records downloaded from the REFLORA Virtual Herbarium using the refloraR package.

Goal

Demonstrate a full cleaning pipeline for biodiversity records from REFLORA, covering:

  • Programmatic data download with refloraR
  • Collector and record number standardization
  • Duplicate detection
  • Output preparation for downstream analysis

Step 1: Install Required Packages

# install.packages("devtools")
devtools::install_github("DBOSlab/refloraR")
devtools::install_github("DBOSlab/barRoso")

Step 2: Download Specimens from REFLORA

Use the refloraR package to retrieve specimen records for a given taxon and herbarium:

library(refloraR)

records <- reflora_records(taxon = "Fabaceae", 
                           herbarium = "RB",
                           save = FALSE)

Step 3: Run barRoso Standardization

library(barRoso)

cleaned <- barroso_std(records,
                       colname_recordedBy = "recordedBy",
                       colname_recordNumber = "recordNumber",
                       colname_country = "country",
                       colname_stateProvince = "stateProvince",
                       flag_duplicates = TRUE,
                       rm_duplicates = FALSE)

Step 4: Explore Results

table(cleaned$duplicate)
head(cleaned[, c("recordedBy", "recordNumber", "duplicate")])

Step 5: Save Cleaned Output

write.csv(cleaned, "reflora_cleaned.csv", row.names = FALSE)

Summary

With this REFLORA case study, we demonstrated how to:

  • Download REFLORA data programmatically with refloraR
  • Clean and harmonize biodiversity records using barRoso
  • Detect and flag duplicates across herbarium specimens

This workflow is reusable for any REFLORA-supported taxa and institutions. Explore additional tools like barroso_cat() to merge datasets from GBIF, JABOT, and speciesLink next.