::barroso_flag_duplicates() barRoso
barroso_flag_duplicates
Description
Identifies and optionally removes duplicate herbarium specimen records based on collector name and number, or—when these are missing—by species name and date. Adds a logical duplicate
column to indicate flagged duplicates.
Details
This function is part of the internal workflow of the barRoso
package, supporting record reconciliation and dataset cleaning. It uses combinations of collector names (recordedBy
), collection numbers (recordNumber
), and collection dates (year
, month
, day
) to identify duplicate entries. When rm_duplicates = TRUE
, one record from each duplicated group is retained, and all others are removed. Specimens missing collector numbers are handled in a separate logic pass using additional fields (species
, recordedBy
, year
, month
, day
) to detect duplicates.
Arguments
Argument | Description |
---|---|
df | A data frame with biodiversity specimen records. |
rm_duplicates | Logical; if TRUE , removes duplicates and retains one record per duplicated group (default: FALSE ). |
Value
A data frame with an added duplicate
column. If rm_duplicates = TRUE
, duplicated entries are removed based on standardized logic.
Examples
<- read.csv("herbarium_data.csv")
df <- barroso_flag_duplicates(df)
df_flagged <- barroso_flag_duplicates(df, rm_duplicates = TRUE) df_clean