Filter Indeterminate REFLORA Specimens

This guide explains how to use the reflora_indets() function in the refloraR package to retrieve occurrence records of indeterminate specimens (e.g., identified only to family or genus level) from the REFLORA Virtual Herbarium hosted by the Rio de Janeiro Botanical Garden.

Function Overview

reflora_indets() is designed to identify and extract plant specimen records from REFLORA collections that are not determined to species level. It allows filtering by herbarium, taxon, locality (Brazilian state), year, and taxonomic rank.

This function is especially useful for identifying possible Linnean shortfalls by summarizing the amount of undetermined taxa across different herbaria or within a particular plant family. It helps researchers pinpoint data gaps in plant identification within REFLORA.

Arguments

Argument Description
level Taxonomic rank to filter indeterminates (e.g., "FAMILY", "GENUS").
herbarium Herbarium codes (e.g., "RB", "SP"). Use NULL for all collections.
repatriated Logical. If FALSE, skips repatriated collections.
taxon Vector of family or genus names to filter by.
state Brazilian state abbreviations to filter locality.
recordYear Year or range of years for filtering (e.g., "2001", c("2000", "2022")).
reorder Reorder columns in the final output. Default is by herbarium, taxa, etc.
path Directory containing previously downloaded REFLORA data. If provided, avoids re-downloading and allows offline or reproducible analyses. Especially important when analyzing a frozen dataset or shared repository.
updates Logical. If TRUE, updates DwC-A archives from REFLORA if versions are outdated.
verbose Logical. If TRUE, prints progress messages.
save Logical. If TRUE, saves the result as a CSV file.
dir Directory to store CSV output.
filename Output filename for results and logs.

Use Case: Retrieve All Indeterminates by Family

reflora_indets(
  level = "FAMILY",
  verbose = TRUE,
  save = TRUE,
  dir = "reflora_indets",
  filename = "family_level_indets"
)

Filter by Herbarium and State

reflora_indets(
  taxon = "Fabaceae",
  herbarium = "RB",
  state = c("BA", "MG"),
  recordYear = c("1990", "2022")
)

Filtering by Taxonomic Rank and Year

reflora_indets(
  level = "GENUS",
  recordYear = "2020",
  save = FALSE
)

Analyzing a Specific Dataset

You can reuse previously downloaded REFLORA data by setting the path argument:

reflora_indets(
  path = "my_reflora_data",
  updates = FALSE,
  verbose = TRUE
)

Why Use path?

  • It allows working offline with already downloaded REFLORA archives.
  • It avoids redundant downloads, speeding up analysis.
  • It improves reproducibility by analyzing a specific static dataset.
  • Set updates = FALSE to avoid modifying your local data.
  • It is ideal when working with a shared folder or preserving frozen datasets.

Use Case: Linnean Shortfalls Insight

By summarizing undetermined records across herbaria or within a taxon, reflora_indets() serves as a powerful diagnostic tool to highlight data gaps in floristic knowledge. It can help researchers:

  • Visualize how many specimens are unidentified across collections.
  • Prioritize taxonomic work on poorly determined groups.
  • Highlight herbaria with the highest volume of unidentified material.

Visualizing Indeterminates per Herbarium

library(dplyr)
library(ggplot2)

results <- reflora_indets(level = "FAMILY",
                          herbarium = c("ALCB", "HUEFS", "RB", "K", "E"),
                          save = FALSE)

results %>%
  count(collectionCode) %>%
  ggplot(aes(x = reorder(collectionCode, n), y = n)) +
  geom_col() +
  coord_flip() +
  labs(x = "Herbarium",
       y = "Number of Indeterminate Specimens at Family Level")

Tips

  • Use reflora_summary() to inspect available collections.
  • For reproducibility, document your input parameters.
  • Logs and CSV files are saved automatically if save = TRUE.

See Also