::std_recordedBy() barRoso
std_recordedBy
Description
Cleans and standardizes the recordedBy
and recordNumber
fields in biodiversity collection data, consolidating collector names and removing inconsistencies across herbarium records. The function identifies and formats collector initials, extracts main collector names, and handles multilingual and complex name structures including multiple collectors, Asian unicode names, and Brazilian surname conventions.
Details
This function is part of the barRoso
package. It supports reconciliation of biodiversity records, especially for resolving collector name discrepancies across duplicate specimens. A new column addCollector
is created when multiple collectors are detected, storing secondary collectors as "et al."
. Original columns can be preserved or overwritten.
Specifically, this function performs extensive string cleaning including:
Converting unicode (e.g., Chinese) to Latin names
Parsing and normalizing collector names split by
&
,
and
,
e
,
y
,
;
,
|
, etc.
Handling cases of one, two, or more collectors
Cleaning spacing, punctuation, and known collector aliases
Adding standardized initials or removing redundant suffixes (e.g., “et al.”)
Arguments
Argument | Description |
---|---|
df | A data frame containing biodiversity records. |
colname_recordedBy | Column name for the main collector (default: “recordedBy”). |
colname_recordNumber | Column name for the collector number (default: “recordNumber”). |
rm_original_column | Logical; if TRUE , original columns are removed after cleaning. If FALSE , they are retained with *Original suffixes (default: FALSE ). |
Value
A data frame with cleaned and harmonized collector name fields. A new column addCollector
is added where additional collectors are identified.
Examples
<- read.csv("herbarium_records.csv")
df <- std_recordedBy(df,
df_clean colname_recordedBy = "coletor",
colname_recordNumber = "num_coleta",
rm_original_column = FALSE)