catGenes::minePlastome()minePlastome
Read and download targeted loci from plastome sequences in GenBank
Description
A function built on the rentrez and geneviewer packages, designed to establish a connection with the GenBank database, donwload, and parse plastomes. This function downloads plastome sequences using provided accession numbers, extracting and formatting any specified targeted loci, and finally writing them in a fasta file format.
Arguments
| Argument | Description |
|---|---|
| genbank | A vector comprising the GenBank accession numbers specifically corresponding to the plastome sequence targeted for locus mining. |
| taxon | A vector containing the taxon name linked to the plastome sequence. In the absence of this information, the function will default to the existing nomenclature linked to the plastome, as originally provided in GenBank. |
| voucher | A vector containing relevant voucher information linked to the plastome sequence. If this information is supplied, the function will promptly append it immediately following the taxon name of the downloaded targeted sequence. |
| CDS | a logical controlling whether the targeted loci are protein coding genes, otherwise the function understands that entered gene names are e.g. intron or intergenic spacer regions. |
| genes | A vector of one or more gene names as annotated in GenBank. |
| rm_gb_files | Logical, if TRUE, the downloaded .gb files from GenBank will be removed from the directory after extracting the targeted loci. The default is FALSE, keeping the original .gb files. |
| verbose | Logical, if FALSE, a message showing each step during the GenBank search will not be printed in the console in full. |
| dir | The path to the directory where the mined DNA sequences in a fasta format file will be saved. The default is to create a directory named RESULTS_minePlastome and the sequences will be saved within a subfolder named after the current date. |
Examples
library(catGenes)
library(dplyr)
data(GenBank_accessions)
GenBank_plastomes <- GenBank_accessions %>%
filter(!is.na(Plastome)) %>%
select(c("Species", "Voucher", "Plastome"))
minePlastome(genbank = GenBank_plastomes$Plastome,
taxon = GenBank_plastomes$Species,
voucher = GenBank_plastomes$Voucher,
CDS = TRUE,
genes = c("matK", "rbcL"),
rm_gb_files = FALSE,
verbose = TRUE,
dir = "RESULTS_minePlastome")