catGenes::combineFASTA()combineFASTA
Description
This function reads multiple FASTA files and combines them into a single FASTA file. It saves the combined sequences to a specified output directory.
Details
The function performs the following steps:
Validates that all specified input files exist
Creates an output directory with date stamp if
save = TRUEReads each specified FASTA file using
ape::read.FASTA()Combines all sequences into a single DNAbin object
Saves the combined sequences to the output file if
save = TRUEReturns summary statistics
Note: This function does not remove duplicate sequences. All sequences from all input files are included in the output.
Arguments
| Argument | Description |
|---|---|
| input_files | Character vector specifying the paths to FASTA files to be combined. This parameter is required - users must specify exactly which files they want to combine. |
| output_file | Character string specifying the name of the output combined FASTA file. Default is “combined_sequences.fasta”. |
| save | Logical. If TRUE (default), the combined sequences are saved to disk in the specified directory. If FALSE, the combined sequences are only returned as an R object without saving. |
| verbose | Logical. If TRUE (default), progress messages are printed to the console. |
| dir | Character string specifying the directory path for saving results. Default is “RESULTS_combineFASTA”. A subdirectory with current date will be created within this directory when save = TRUE. |
Value
A list containing:
sequences: DNAbin object with all combined sequences
summary: Data frame with summary statistics
output_path: Path to the saved combined FASTA file (if saved)
Examples
# Basic usage: combine specific FASTA files
result <- combineFASTA(input_files = c("file1.fasta", "file2.fasta", "file3.fasta"))
# Combine files with custom output name
result <- combineFASTA(
input_files = c("data/gene1.fasta", "data/gene2.fasta"),
output_file = "all_genes.fasta"
)
# Return results without saving to disk
result <- combineFASTA(
input_files = c("temp1.fasta", "temp2.fasta"),
save = FALSE,
verbose = TRUE
)
# Access results
sequences <- result$sequences
summary_stats <- result$summary
print(summary_stats)