library(catGenes)
convertAlign(
filepath = "path_to_alignments",
format = "NEXUS"
)Convert among FASTA, NEXUS, and PHYLIP
Overview
Phylogenetic workflows often involve moving among different alignment formats depending on the downstream software being used. In catGenes, the function convertAlign() provides a simple way to convert one or more DNA alignments among the most commonly used formats:
FASTANEXUSPHYLIP
This is useful when you need to standardize file formats before alignment, concatenation, or phylogenetic analysis.
When to use convertAlign()
Use convertAlign() when:
- your alignments are stored in a format that is not convenient for the next analysis step
- you need to convert
FASTAfiles toNEXUSbefore loading them intocatGenes - you need to generate
PHYLIPfiles for external phylogenetic programs - you want to standardize a folder of mixed or differently formatted alignments
In many catGenes workflows, NEXUS is the most convenient format for downstream concatenation because it can be read directly with ape::read.nexus.data().
Basic usage
A typical conversion workflow looks like this:
This reads one or more alignment files from the specified directory and writes the converted files in NEXUS format.
Choosing the output format
The argument format defines the output format to be written. Supported values are:
NEXUSFASTA- `PHYLIP``
For example, to convert a set of files to FASTA:
convertAlign(
filepath = "path_to_alignments",
format = "FASTA"
)To convert to PHYLIP:
convertAlign(
filepath = "path_to_alignments",
format = "PHYLIP"
)Converting alignments to NEXUS
This is a particularly common step when preparing alignments for concatenation with catGenes.
convertAlign(
filepath = "RESULTS_alignSeqs/09Mar2026",
format = "NEXUS"
)After conversion, the resulting files can be imported into R with ape::read.nexus.data() and organized into a named list for concatenation.
Converting alignments to FASTA
You may also want to export alignments to FASTA, for example when preparing files for manual inspection or downstream tools that accept FASTA more readily.
convertAlign(
filepath = "path_to_alignments",
format = "FASTA"
)Converting alignments to PHYLIP
Some phylogenetic programs and external workflows require PHYLIP format.
convertAlign(
filepath = "path_to_alignments",
format = "PHYLIP"
)This can be useful when preparing alignment files for software outside the immediate catGenes workflow.
Working with a folder of alignments
convertAlign()is designed to work on a directory of input files. This is especially useful when several loci need to be converted in one step.
For example:
convertAlign(
filepath = "RESULTS_alignSeqs/09Mar2026",
format = "NEXUS"
)or
convertAlign(
filepath = "RESULTS_mineSeq/09Mar2026",
format = "FASTA"
)This batch-conversion approach helps standardize a workflow quickly.
Saving converted files in a custom directory
By default, convertAlign() creates a results directory to save the converted files. You can also specify a custom output directory using the dir argument.
convertAlign(
filepath = "path_to_alignments",
format = "NEXUS",
dir = "RESULTS_convertAlign"
)This is useful when you want all converted outputs written to a specific project folder.
Removing original files
If you want the converted files to replace the original inputs, you can use rmfiles = TRUE.
convertAlign(
filepath = "path_to_alignments",
format = "NEXUS",
rmfiles = TRUE
)This removes the original input files after conversion. Because deleting original files can be risky, it is usually safer to keep rmfiles = FALSE unless you are sure the converted outputs are correct.
Example workflow after sequence alignment
A common use case is to convert aligned files into NEXUS format after a previous alignment step.
For example:
Step 1. Align sequences
alignSeqs(
filepath = "RESULTS_mineSeq/09Mar2026",
method = "ClustalW",
format = "FASTA",
dir = "RESULTS_alignSeqs"
)Step 2. Convert the aligned outputs to NEXUS
convertAlign(
filepath = "RESULTS_alignSeqs/09Mar2026",
format = "NEXUS",
dir = "RESULTS_convertAlign"
)Step 3. Load the NEXUS files into R
genes <- list.files("RESULTS_convertAlign/09Mar2026")
my_alignments <- list()
for (i in genes) {
my_alignments[[i]] <- ape::read.nexus.data(
paste0("RESULTS_convertAlign/09Mar2026/", i)
)
}
names(my_alignments) <- gsub("[.].*", "", names(my_alignments))This produces a named list ready for concatenation with catGenes.
Understanding the output directory
When converted files are written to disk, they are typically saved in a directory such as:
RESULTS_convertAlign/09Mar2026/This date-based structure helps keep conversion results organized across runs.
Recommended file naming
Before conversion, it is helpful if files are already named clearly by locus, for example:
ITS.fasta
matK.fasta
rbcL.fastaAfter conversion, these names usually remain easy to interpret and help keep downstream list names and partition labels clear.
Loading converted files into R
If you convert files to NEXUS, they can then be loaded into R with ape::read.nexus.data().
genes <- list.files("RESULTS_convertAlign/09Mar2026")
my_alignments <- list()
for (i in genes) {
my_alignments[[i]] <- ape::read.nexus.data(
paste0("RESULTS_convertAlign/09Mar2026/", i)
)
}
names(my_alignments) <- gsub("[.].*", "", names(my_alignments))This creates the standard input structure for catGenes concatenation functions.
Typical workflow after conversion
Once files are converted and loaded into R, the next step is usually to concatenate the loci. For datasets with one sequence per species per locus:
catdf <- catfullGenes(
my_alignments,
shortaxlabel = TRUE,
missdata = TRUE
)For datasets containing duplicated taxa or multiple accessions:
catdf <- catmultGenes(
my_alignments,
maxspp = TRUE,
shortaxlabel = TRUE,
missdata = TRUE
)Common issues
Input files are not in a recognizable alignment format
convertAlign() expects standard alignment files in formats that can be interpreted correctly. If the input files are malformed, conversion may fail.
The wrong output format is chosen
Different downstream tools expect different formats. If your next step is concatenation with catGenes, NEXUS is usually the best choice.
Original files are deleted unintentionally
If rmfiles = TRUE, the input files are removed after conversion. Use this option carefully and only when you are confident the converted files are correct.
File names remain difficult to interpret
Conversion changes file format, not naming conventions. If file names are unclear before conversion, they may still be unclear afterward. Rename files when needed so that each locus remains easy to identify.
Recommended practice
For the smoothest workflow:
- keep one file per locus
- use simple and consistent file names
- convert alignments to
NEXUSbefore loading them into catGenes - keep the original files unless you are sure the conversion results are correct
- inspect converted outputs before moving to concatenation
Next step
Once your alignments have been converted into the desired format, the next step is usually to:
- load them into R
- organize them as a named list
- concatenate loci with
catfullGenes()orcatmultGenes()
See the next tutorials for concatenating multilocus datasets with catGenes.