Convert among FASTA, NEXUS, and PHYLIP

Overview

Phylogenetic workflows often involve moving among different alignment formats depending on the downstream software being used. In catGenes, the function convertAlign() provides a simple way to convert one or more DNA alignments among the most commonly used formats:

  • FASTA
  • NEXUS
  • PHYLIP

This is useful when you need to standardize file formats before alignment, concatenation, or phylogenetic analysis.

When to use convertAlign()

Use convertAlign() when:

  • your alignments are stored in a format that is not convenient for the next analysis step
  • you need to convert FASTA files to NEXUS before loading them into catGenes
  • you need to generate PHYLIP files for external phylogenetic programs
  • you want to standardize a folder of mixed or differently formatted alignments

In many catGenes workflows, NEXUS is the most convenient format for downstream concatenation because it can be read directly with ape::read.nexus.data().

Basic usage

A typical conversion workflow looks like this:

library(catGenes)

convertAlign(
  filepath = "path_to_alignments",
  format = "NEXUS"
)

This reads one or more alignment files from the specified directory and writes the converted files in NEXUS format.

Choosing the output format

The argument format defines the output format to be written. Supported values are:

  • NEXUS
  • FASTA
  • `PHYLIP``

For example, to convert a set of files to FASTA:

convertAlign(
  filepath = "path_to_alignments",
  format = "FASTA"
)

To convert to PHYLIP:

convertAlign(
  filepath = "path_to_alignments",
  format = "PHYLIP"
)

Converting alignments to NEXUS

This is a particularly common step when preparing alignments for concatenation with catGenes.

convertAlign(
  filepath = "RESULTS_alignSeqs/09Mar2026",
  format = "NEXUS"
)

After conversion, the resulting files can be imported into R with ape::read.nexus.data() and organized into a named list for concatenation.

Converting alignments to FASTA

You may also want to export alignments to FASTA, for example when preparing files for manual inspection or downstream tools that accept FASTA more readily.

convertAlign(
  filepath = "path_to_alignments",
  format = "FASTA"
)

Converting alignments to PHYLIP

Some phylogenetic programs and external workflows require PHYLIP format.

convertAlign(
  filepath = "path_to_alignments",
  format = "PHYLIP"
)

This can be useful when preparing alignment files for software outside the immediate catGenes workflow.

Working with a folder of alignments

convertAlign()is designed to work on a directory of input files. This is especially useful when several loci need to be converted in one step.

For example:

convertAlign(
  filepath = "RESULTS_alignSeqs/09Mar2026",
  format = "NEXUS"
)

or

convertAlign(
  filepath = "RESULTS_mineSeq/09Mar2026",
  format = "FASTA"
)

This batch-conversion approach helps standardize a workflow quickly.

Saving converted files in a custom directory

By default, convertAlign() creates a results directory to save the converted files. You can also specify a custom output directory using the dir argument.

convertAlign(
  filepath = "path_to_alignments",
  format = "NEXUS",
  dir = "RESULTS_convertAlign"
)

This is useful when you want all converted outputs written to a specific project folder.

Removing original files

If you want the converted files to replace the original inputs, you can use rmfiles = TRUE.

convertAlign(
  filepath = "path_to_alignments",
  format = "NEXUS",
  rmfiles = TRUE
)

This removes the original input files after conversion. Because deleting original files can be risky, it is usually safer to keep rmfiles = FALSE unless you are sure the converted outputs are correct.

Example workflow after sequence alignment

A common use case is to convert aligned files into NEXUS format after a previous alignment step.

For example:

Step 1. Align sequences

alignSeqs(
  filepath = "RESULTS_mineSeq/09Mar2026",
  method = "ClustalW",
  format = "FASTA",
  dir = "RESULTS_alignSeqs"
)

Step 2. Convert the aligned outputs to NEXUS

convertAlign(
  filepath = "RESULTS_alignSeqs/09Mar2026",
  format = "NEXUS",
  dir = "RESULTS_convertAlign"
)

Step 3. Load the NEXUS files into R

genes <- list.files("RESULTS_convertAlign/09Mar2026")
my_alignments <- list()

for (i in genes) {
  my_alignments[[i]] <- ape::read.nexus.data(
    paste0("RESULTS_convertAlign/09Mar2026/", i)
  )
}

names(my_alignments) <- gsub("[.].*", "", names(my_alignments))

This produces a named list ready for concatenation with catGenes.

Understanding the output directory

When converted files are written to disk, they are typically saved in a directory such as:

RESULTS_convertAlign/09Mar2026/

This date-based structure helps keep conversion results organized across runs.

Loading converted files into R

If you convert files to NEXUS, they can then be loaded into R with ape::read.nexus.data().

genes <- list.files("RESULTS_convertAlign/09Mar2026")
my_alignments <- list()

for (i in genes) {
  my_alignments[[i]] <- ape::read.nexus.data(
    paste0("RESULTS_convertAlign/09Mar2026/", i)
  )
}

names(my_alignments) <- gsub("[.].*", "", names(my_alignments))

This creates the standard input structure for catGenes concatenation functions.

Typical workflow after conversion

Once files are converted and loaded into R, the next step is usually to concatenate the loci. For datasets with one sequence per species per locus:

catdf <- catfullGenes(
  my_alignments,
  shortaxlabel = TRUE,
  missdata = TRUE
)

For datasets containing duplicated taxa or multiple accessions:

catdf <- catmultGenes(
  my_alignments,
  maxspp = TRUE,
  shortaxlabel = TRUE,
  missdata = TRUE
)

Common issues

Input files are not in a recognizable alignment format

convertAlign() expects standard alignment files in formats that can be interpreted correctly. If the input files are malformed, conversion may fail.

The wrong output format is chosen

Different downstream tools expect different formats. If your next step is concatenation with catGenes, NEXUS is usually the best choice.

Original files are deleted unintentionally

If rmfiles = TRUE, the input files are removed after conversion. Use this option carefully and only when you are confident the converted files are correct.

File names remain difficult to interpret

Conversion changes file format, not naming conventions. If file names are unclear before conversion, they may still be unclear afterward. Rename files when needed so that each locus remains easy to identify.

Next step

Once your alignments have been converted into the desired format, the next step is usually to:

  • load them into R
  • organize them as a named list
  • concatenate loci with catfullGenes() or catmultGenes()

See the next tutorials for concatenating multilocus datasets with catGenes.