std_taxa

Standardize Taxonomic Columns in Biodiversity Records
barRoso::std_taxa()

Description

Cleans and standardizes taxonomic fields in a biodiversity collection dataset. Specifically targets and harmonizes the family, genus, and specificEpithet columns, correcting legacy naming (e.g. Leguminosae → Fabaceae), removing ambiguous entries, and formatting genus/species names for consistency.

Details

This function is part of the barRoso package and is designed to improve the quality of taxon names for reconciliation, querying, and label generation. It removes common taxonomic noise such as uncertain identifiers (e.g. “cf.”, “aff.”, “indet.”), numeric placeholders, and genus-only labels mistakenly stored in the species field. Genus names are capitalized, and legacy family names (like Leguminosae) are standardized to their accepted equivalents (e.g. Fabaceae).

Arguments

Argument Description
df A data frame with biodiversity collection records.
colname_family Name of the column containing plant family names (default: "family").
colname_genus Name of the column containing genus names (default: "genus").
colname_specificEpithet Name of the column containing specific epithet of the species names (default: "specificEpithet").
rm_original_column Logical; if TRUE, original columns are removed after cleaning (default: TRUE).

Value

A data frame with cleaned and standardized family, genus, and specificEpithet columns. If rm_original_column = FALSE, original values are retained with a *Original suffix.

Examples

df <- read.csv("taxa.csv")
df_clean <- std_taxa(df,
                     colname_family = "familia",
                     colname_genus = "genero",
                     colname_specificEpithet = "especie",
                     rm_original_column = FALSE)