::barroso_std() barRoso
barroso_std
Description
A wrapper function that performs integrated cleaning and standardization of biodiversity collection records using the barRoso
package. This includes harmonizing taxonomic, geographic, collector, and type status information, as well as flagging or removing unvouchered and duplicate specimens.
Details
This function orchestrates several std_* functions from the barRoso
package to clean records from virtual herbaria and biodiversity portals. It handles multilingual field names, missing data, inconsistent formatting, and dataset chunking for large inputs. The function also detects and optionally removes duplicate records and specimens lacking voucher information.
Arguments
Argument | Description |
---|---|
… | Input data frame containing raw biodiversity records. |
unvouchered | Logical; if TRUE , remove unvouchered wood/seed/spirit specimens (default: TRUE ). |
delunkcoll | Logical; if TRUE , removes records with unknown collectors (default: FALSE ). |
flag_missid | Reserved for future use. Currently not implemented. |
flag_duplicates | Logical; if TRUE , flags duplicates with a logical column (default: TRUE ). |
rm_duplicates | Logical; if TRUE , removes duplicate specimens (default: FALSE ). |
colname_recordedBy | Column name for collector names. |
colname_recordNumber | Column name for collector number. |
colname_continent | Column name for continent. |
colname_country | Column name for country. |
colname_stateProvince | Column name for state/province. |
colname_county | Column name for county. |
colname_municipality | Column name for municipality. |
colname_locality | Column name for locality. |
colname_collectionCode | Column name for collection code. |
colname_institutionCode | Column name for institution code. |
colname_typeStatus | Column name for type status. |
colname_family | Column name for family. |
colname_genus | Column name for genus. |
colname_specificEpithet | Column name for specific epithet. |
rm_original_column | Logical; if TRUE , remove original columns after standardization. |
Value
A fully cleaned and standardized data frame ready for downstream reconciliation, duplicate handling, and label generation.
Examples
<- read.csv("raw_herbarium_data.csv")
df <- barroso_std(df,
df_std colname_country = "pais",
colname_stateProvince = "estado",
rm_duplicates = TRUE)