dropSeq

Removes duplicated accessions of the same species in DNA alignments

catGenes::dropSeq()

Description

This function drops the smaller sequence(s) (with missing characters) for any species duplicated with multiple accessions in the DNA alignment. You can run dropSeq with multiple DNA alignments, but then we recommend to run catmultGenes first.

Arguments

Argument	Description
…	one or a list of NEXUS-formatted gene datasets as loaded by ape’s `read.nexus.data`, for example. You can also load the resulting list of equally-sized dataframes containing the input gene datasets, as generated by the function `catmultGenes`.
shortaxlabel	Logical, if `TRUE` the final individual gene dataset will delete the accession numbers associated with each species or sequence.

Value

A list of dataframe(s) containing the input DNA alignment(s), where duplicated accessions of the same species are removed.

Examples

data(Ormosia)

# Run dropSeq for one or more individual DNA alignment and then save each
# dataset with non-duplicated species using nexusdframe, phylipdframe or fastadframe
df <- dropSeq(Ormosia)
ITS <- df[[1]]
nexusdframe(ITS, file = "filename.nex")


# Run function catmultGenes first
catdf <- catmultGenes(Ormosia,
                      maxspp = TRUE,
                      shortaxlabel = FALSE,
                      missdata = TRUE)

# Run dropSeq for the entire concatenated DNA alignments
catdf <- dropSeq(catdf,
                 shortaxlabel = FALSE)

# Then save the concatenated DNA alignment
writeNexus(catdf,
           file = "filename.nex",
           bayesblock = TRUE,
           interleave = TRUE)