barroso_flag_duplicates

Flag or Remove Duplicate Specimens
barRoso::barroso_flag_duplicates()

Description

Identifies and optionally removes duplicate herbarium specimen records based on collector name and number, or—when these are missing—by species name and date. Adds a logical duplicate column to indicate flagged duplicates.

Details

This function is part of the internal workflow of the barRoso package, supporting record reconciliation and dataset cleaning. It uses combinations of collector names (recordedBy), collection numbers (recordNumber), and collection dates (year, month, day) to identify duplicate entries. When rm_duplicates = TRUE, one record from each duplicated group is retained, and all others are removed. Specimens missing collector numbers are handled in a separate logic pass using additional fields (species, recordedBy, year, month, day) to detect duplicates.

Arguments

Argument Description
df A data frame with biodiversity specimen records.
rm_duplicates Logical; if TRUE, removes duplicates and retains one record per duplicated group (default: FALSE).

Value

A data frame with an added duplicate column. If rm_duplicates = TRUE, duplicated entries are removed based on standardized logic.

Examples

df <- read.csv("herbarium_data.csv")
df_flagged <- barroso_flag_duplicates(df)
df_clean <- barroso_flag_duplicates(df, rm_duplicates = TRUE)