Model selection and MrBayes preparation

Overview

After concatenating multilocus alignments and exporting the final dataset, a common next step is to determine suitable substitution models for each partition and prepare the dataset for Bayesian phylogenetic analysis. In catGenes, this workflow is supported by evomodelTest().

The function evomodelTest() performs evolutionary model selection using phangorn::modelTest(), identifies the best-fitting model according to an information criterion, and can generate MrBayes command blocks for downstream analysis. When desired, it can also append these commands directly to NEXUS files.

This article explains how to use evomodelTest() for model selection and MrBayes preparation, how its main arguments affect the output, and how it fits into a broader catGenes phylogenetic workflow.

When to use evomodelTest()

Use evomodelTest() when:

  • you already have one or more aligned NEXUS files
  • you want to compare nucleotide substitution models for each locus
  • you want to select models according to AIC or BIC
  • you want to generate a MrBayes block automatically
  • you want to prepare a combined partitioned dataset for Bayesian analysis

A typical point to use this function is after:

  1. aligning loci
  2. loading them into R
  3. concatenating the dataset
  4. exporting the aligned partitions or final concatenated matrix as NEXUS

Relationship with the broader workflow

A common catGenes workflow is:

  1. retrieve or mine sequences
  2. align loci
  3. concatenate the dataset with catfullGenes() or catmultGenes()
  4. export the final matrix with writeNexus()
  5. use evomodelTest() to select models and prepare a MrBayes block
  6. run MrBayes with mrbayesRun()

This means evomodelTest() serves as the bridge between dataset preparation and Bayesian phylogenetic inference.

Basic usage

A simple model-selection workflow looks like this:

library(catGenes)

evomodelTest(
  nexus_file_path = "Vataireoids.nex",
  model_criteria = "BIC"
)

This evaluates substitution models for the provided NEXUS alignment and generates a report of model-selection results.

Required input

The main input is nexus_file_path, which points to one or more NEXUS alignment files.

For example:

evomodelTest(
  nexus_file_path = "path/to/ITS.nex"
)

or, for multiple locus files:

evomodelTest(
  nexus_file_path = c("path/to/ITS.nex",
                      "path/to/matK.nex",
                      "path/to/rbcL.nex")
)

These can be individual alignments that you later want to use in a partitioned Bayesian analysis.

Choosing the information criterion

The argument model_criteria controls how the best model is chosen. Supported options include: - “AIC” - “BIC”

For example, using AIC:

evomodelTest(
  nexus_file_path = "Vataireoids.nex",
  model_criteria = "AIC"
)

In many phylogenetic workflows, BIC is a conservative and commonly preferred criterion, but both are useful depending on analytical preference.

Choosing which models to test

The argument models_to_test controls the set of substitution models included in the comparison. You can test all available models:

evomodelTest(
  nexus_file_path = "Vataireoids.nex",
  models_to_test = "all"
)

Or a standard subset:

evomodelTest(
  nexus_file_path = "Vataireoids.nex",
  models_to_test = "standard"
)

You can also specify a custom vector of models:

evomodelTest(
  nexus_file_path = "Vataireoids.nex",
  models_to_test = c("JC", "HKY", "GTR")
)

This is useful when you want to restrict testing to a smaller, interpretable set of candidate models.

Including gamma rate heterogeneity

The argument include_G controls whether among-site rate variation modeled with gamma-distributed rates should be evaluated.

evomodelTest(
  nexus_file_path = "Vataireoids.nex",
  include_G = TRUE
)

Including gamma-distributed rate heterogeneity is often appropriate for empirical DNA datasets.

Including invariant sites

The argument include_I controls whether models including a proportion of invariant sites are tested.

evomodelTest(
  nexus_file_path = "Vataireoids.nex",
  include_I = TRUE
)

The combination of +G and +I options can have a substantial effect on model choice, so it is useful to make these decisions explicitly.

Parallel computation

The argument mc.cores controls the number of cores used during model testing.

evomodelTest(
  nexus_file_path = "Vataireoids.nex",
  mc.cores = 4
)

Using multiple cores can speed up model testing, especially when working with multiple loci or larger alignments.

Appending a MrBayes block to NEXUS files

One of the main strengths of evomodelTest() is that it can automatically generate and append MrBayes commands to the input NEXUS file.

evomodelTest(
  nexus_file_path = "Vataireoids.nex",
  append_mrbayes_to_nexus = TRUE
)

This is convenient because it prepares the alignment for direct downstream use with MrBayes. If you do not want to append the block:

evomodelTest(
  nexus_file_path = "Vataireoids.nex",
  append_mrbayes_to_nexus = FALSE
)

Overwriting the original NEXUS file

The argument overwrite_original_nexus controls whether the original NEXUS file should be replaced when the MrBayes block is appended. To overwrite the original file:

evomodelTest(
  nexus_file_path = "Vataireoids.nex",
  append_mrbayes_to_nexus = TRUE,
  overwrite_original_nexus = TRUE
)

To preserve the original file and write a modified copy instead:

evomodelTest(
  nexus_file_path = "Vataireoids.nex",
  append_mrbayes_to_nexus = TRUE,
  overwrite_original_nexus = FALSE
)

Keeping the original file unchanged is often safer, especially when testing alternative model-selection settings.

Preparing a combined partitioned MrBayes block

If you already have a concatenated NEXUS matrix and want to append a complete partitioned MrBayes block to it, use combined_nexus_file_path. For example:

evomodelTest(
  nexus_file_path = c("ITS.nex", "matK.nex", "rbcL.nex"),
  combined_nexus_file_path = "combined_dataset.nex",
  append_mrbayes_to_nexus = TRUE
)

This allows model selection to be performed on the individual loci while the final combined MrBayes block is added to the concatenated partitioned matrix.

This is one of the most useful workflows when moving from locus-specific alignments to a partitioned Bayesian analysis.

Choosing the output directory

The argument dir controls where model-selection results are written.

evomodelTest(
  nexus_file_path = "Vataireoids.nex",
  dir = "RESULTS_evomodelTest"
)

By default, the function creates a results folder with a date-stamped subdirectory. This helps keep multiple model-selection runs organized.

Verbose output

The argument verbose controls whether progress messages are printed during execution.

evomodelTest(
  nexus_file_path = "Vataireoids.nex",
  verbose = TRUE
)

To suppress progress messages:

evomodelTest(
  nexus_file_path = "Vataireoids.nex",
  verbose = FALSE
)

This can be useful when rendering tutorials or running batch workflows.

A complete example workflow

A typical workflow might look like this:

Step 1. Export a concatenated NEXUS dataset

writeNexus(
  catdf,
  file = "Vataireoids.nex",
  genomics = FALSE,
  interleave = TRUE,
  bayesblock = FALSE
)

Step 2. Run model selection and append a MrBayes block

evomodelTest(
  nexus_file_path = "Vataireoids.nex",
  model_criteria = "BIC",
  models_to_test = "standard",
  include_G = TRUE,
  include_I = TRUE,
  append_mrbayes_to_nexus = TRUE,
  overwrite_original_nexus = FALSE,
  dir = "RESULTS_evomodelTest"
)

At this point, you have:

  • a model-selection report
  • a NEXUS file with a generated MrBayes block
  • a prepared input for downstream Bayesian analysis

Typical workflow for partitioned multilocus analyses

A particularly useful workflow is: - align loci separately - export each locus as NEXUS - export a concatenated partitioned matrix with writeNexus() - run evomodelTest() on the separate loci - append the partitioned MrBayes block to the concatenated NEXUS file - run the resulting file with mrbayesRun()

This keeps model choice tied to each locus while preparing a combined Bayesian analysis.

The results directory may also contain: - a detailed text report - modified NEXUS files with appended MrBayes blocks - summaries of the selected models

Inspecting these outputs is useful before proceeding to Bayesian inference.

Common issues

Input file is not in NEXUS format

evomodelTest() expects NEXUS alignment files. If your data are in another format, convert them first using convertAlign() or export them with writeNexus().

Model comparison takes too long

Large alignments or many candidate models can increase runtime. To reduce computation time, you can:

  • reduce the number of models tested
  • lower retmax-style dataset size earlier in the workflow
  • increase mc.cores
  • use “standard” rather than “all”

Original NEXUS file is overwritten unintentionally

Be cautious when using overwrite_original_nexus = TRUE. If you want to preserve the original matrix, keep this option set to FALSE.

The MrBayes block is not appended where expected

Make sure:

  • append_mrbayes_to_nexus = TRUE
  • the nexus_file_path is correct
  • combined_nexus_file_path is provided when working with a separate concatenated matrix

The selected model does not match expectations

Different criteria (AIC vs BIC) or different choices for include_G and include_I can change the best-fitting model. This is normal and should be interpreted in the context of the dataset and analytical goals.

Next step

Once model selection has been completed and the MrBayes block has been prepared, the next step is to run the Bayesian analysis itself using mrbayesRun() or an external MrBayes workflow.