barRoso::std_taxa()std_taxa
Description
Cleans and standardizes taxonomic fields in a biodiversity collection dataset. Specifically targets and harmonizes the family, genus, and specificEpithet columns, correcting legacy naming (e.g. Leguminosae → Fabaceae), removing ambiguous entries, and formatting genus/species names for consistency.
Details
This function is part of the barRoso package and is designed to improve the quality of taxon names for reconciliation, querying, and label generation. It removes common taxonomic noise such as uncertain identifiers (e.g. “cf.”, “aff.”, “indet.”), numeric placeholders, and genus-only labels mistakenly stored in the species field. Genus names are capitalized, and legacy family names (like Leguminosae) are standardized to their accepted equivalents (e.g. Fabaceae).
Arguments
| Argument | Description |
|---|---|
| df | A data frame with biodiversity collection records. |
| colname_family | Name of the column containing plant family names (default: "family"). |
| colname_genus | Name of the column containing genus names (default: "genus"). |
| colname_specificEpithet | Name of the column containing specific epithet of the species names (default: "specificEpithet"). |
| rm_original_column | Logical; if TRUE, original columns are removed after cleaning (default: TRUE). |
Value
A data frame with cleaned and standardized family, genus, and specificEpithet columns. If rm_original_column = FALSE, original values are retained with a *Original suffix.
Examples
df <- read.csv("taxa.csv")
df_clean <- std_taxa(df,
colname_family = "familia",
colname_genus = "genero",
colname_specificEpithet = "especie",
rm_original_column = FALSE)