catGenes::mineSeq()mineSeq
Read and download DNA sequences from GenBank
Description
An ape-based function to connect with the GenBank database, read nucleotide sequences using accession numbers, and write them in a fasta format file.
Arguments
| Argument | Description |
|---|---|
| inputdf | A dataframe object containing the taxon names in a ‘Species’ column, the voucher information in ‘Voucher’ column, and the GenBank accessions for each genes in separate columns named by the corresponding gene. If the columns ‘Species’ and ‘Voucher’ are not provided in the dataframe, then the function will consider the taxonomy of the retrieved sequences as originally available in GenBank. |
| gb.colnames | A vector with column names within the inputdf dataframe corresponding to each gene, where the GenBank accession numbers are listed. |
| as.character | A logical controlling whether to return the sequences as an object of class “DNAbin” (the default). |
| verbose | Logical, if FALSE, a message showing each step during the GenBank search will not be printed in the console in full. |
| save | Logical, if TRUE, the mined sequences will be saved on disk. |
| filename | Name of the output file to be saved. The default is to create a file entitled GenBanK_seqs. |
| dir | The path to the directory where the mined DNA sequences in a fasta format file will be saved provided that the argument save is set up in TRUE. The default is to create a directory named RESULTS_mineSeq and the sequences will be saved within a subfolder named after the current date. |
Value
A list of DNA sequences made of vectors of class ‘DNAbin’, or of single characters (if as.character = TRUE) with two attributes (species and description).
Examples
library(catGenes)
data(GenBank_accessions)
mineSeq(inputdf = GenBank_accessions,
gb.colnames = c("ETS", "ITS", "matK", "petBpetD", "trnTF", "Xdh"),
as.character = FALSE,
verbose = TRUE,
save = TRUE,
filename = "GenBanK_seqs",
dir = "RESULTS_mineSeq")