# MrBayes

Building trees with MrBayes

Bayesian inference

Phylogenetic analysis using Bayesian method

MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters. Here I have described two options of using MrBayes (i) command line and (ii) webserver.

ðŸ”— **Useful resources and links**

- Homepage
- Manual v3.2
- Tutorial
- John P. Huelsenbeck and Fredrik Ronquist, Bayesian Analysis of Molecular Evolution Using MrBayes

## Command line

Instructions for commandline workflow on MacOS for MrBayes v3.2

**Install**

Instructions on how to installl MrBayes are available here and here.

Download source code for compliation on Unix machines 3.2.7a.

Open a terminal window and execute the following, remember you may need to navigate to change directory or move download file to currently working directory.

```
~ tar zxf mrbayes-3.2.tar.gz
~ cd mrbayes-3.2
~/mrbayes-3.2 $ ./configure
~/mrbayes-3.2 $ make
~/mrbayes-3.2 $ export CC=MyCompiler
~/mrbayes-3.2 % ./configure
```

You should now be ready to start!

### 1. Nexus file format

You will need to convert your aligned sequence file to a nexus file first. Although this can commonly be done in most sequence analysis programs (e.g. geneious) using the `export as...`

make sure you check the file formatting by opening the `nexus`

file in a text editor program (e.g. atom).

I find a more reliable way is to use a web program to convert the fasta file to a nexus file - such as one available here. Make sure the sequence names do not contain any of the following; hyphens `-`

, commas`,`

, semicolcons `:`

, exclamation marks `!`

. Underscores `_`

and periods `.`

are okay.

### 2. Activate MrBayes

Type `mb`

to activate MrBayes.

Set the data file for analysis

*Example* - alignment file called `AlignedSeqs.nexus`

This will check will ensure your data is formatting and read correctly by the program. Any issues at this point are most likely due to the syntax and formatting of your nexus file. Note: you may need to also include the path to the file if you are not already in the correct directory. Generic Shell commands (e.g. `ls`

`cd`

) do not work in the MrBayes environment.

### 3. Specify the evolutionary model

This is where you start to set the parameters for your analysis. You will likely need to do some background work to figure out which is the correct model for your data - see here.

If your data are not DNA or RNA, if you want to invoke a different model, or if you want to use non-default priors, refer to the MrBayes manual and Appendix for more help.

**Specifying a model**

Type `help lset`

for more information

Parameter | Options | Current setting |
---|---|---|

Nucmodel | 4by4/Doublet/Codon/Protein | 4by4 |

Nst | 1(JC/F81)/2(HKY)/6(GTR)/Mixed | 1 |

Code | Universal/Vertmt/Mycoplasma/Yeast/Ciliates/Metmt | Universal |

Ploidy | Haploid/Dilpoid/Zlinked | Diploid |

Rates | Equal/Gamma/Propinv/Invgamma/Adgamma | Equal |

Ngammacat | number | 4 |

Nbetacat | number | 5 |

Omegavar | Equal/Ny93/M3 | Equal |

Covarion | No/Yes | No |

Coding | All/Variable/Noabsencesites/Nopresencesites | All |

Parsmodel | No/Yes | No |

*Example* - Set to a GTR + I + G To set a General Time Reversible (GTR) model with Gamma (+G) invariable (+I) distribution, execute the following:

`lset nst=6 rates=invgamma`

### 4. Setting the Priors

**Priors** Type `help prset`

for more information

Parameter | Options | Current setting |
---|---|---|

Tratiopr | Beta/Fixed | Beta(1.0,1.0) |

Revmatpr | Dirichlet/Fixed | Dirichlet(1.0,1.0,1.0,1.0,1.0,1.0) |

Aamodelpr | Fixed/Mixed | Fixed(Poisson) |

Aarevmatpr | Dirichlet/Fixed | Dirichlet(1.0,1.0,â€¦) |

Omegapr | Dirichlet/Fixed | Dirichlet(1.0,1.0) |

Ny98omega1pr | Beta/Fixed | Beta(1.0,1.0) |

Ny98omega3pr | Uniform/Exponential/Fixed | Exponential(1.0) |

M3omegapr | Exponential/Fixed | Exponential |

Codoncatfreqs | Dirichlet/Fixed | Dirichlet(1.0,1.0,1.0) |

Statefreqpr | Dirichlet/Fixed | Dirichlet(1.0,1.0,1.0,1.0) |

Shapepr | Uniform/Exponential/Fixed | Uniform(0.0,200.0) |

Ratecorrpr | Uniform/Fixed | Uniform(-1.0,1.0) |

Pinvarpr | Uniform/Fixed | Uniform(0.0,1.0) |

Covswitchpr | Uniform/Exponential/Fixed | Uniform(0.0,100.0) |

Symdirihyperpr | Uniform/Exponential/Fixed | Fixed(Infinity) |

Topologypr | Uniform/Constraints | Uniform |

Brlenspr | Unconstrained/Clock | Unconstrained:Exp(10.0) |

Treeheightpr | Exponential/Gamma | Exponential(1.0) |

Speciationpr | Uniform/Exponential/Fixed | Uniform(0.0,10.0) |

Extinctionpr | Uniform/Exponential/Fixed | Uniform(0.0,10.0) |

Sampleprob | number | 1.00 |

Thetapr | Uniform/Exponential/Fixed | Uniform(0.0,10.0) |

Nodeagepr | Unconstrained/Calibrated | Unconstrained |

Treeagepr | Fixed/Uniform/Offsetexponential | Fixed(1.00) |

Clockratepr | Strict/Cpp/Bm | Strict |

Cppratepr | Fixed/Exponential | Exponential(0.10) |

Psigammapr | Fixed/Exponential/Uniform | Fixed(1.00) |

Nupr | Fixed/Exponential/Uniform | Fixed(1.00) |

Ratepr | Fixed/Variable=Dirichlet | Fixed |

*Example* - Set the G and I parameters To manually set the shape parameter of the gamma distribution of rate variation (`Shapepr`

) and the proportion of invariable sites (`Pinvarpr`

), execute the following:

`prset shapepr=fixed(0.05) pinvarpr=fixed(0.30)`

### 5. Running Analysis

The analysis is started by issuing the mcmc command. However, before doing this, we recommend that you review the run settings by typing help mcmc. The help mcmc command will produce the following table at the bottom of the output:

Parameter | Options | Current setting |
---|---|---|

Ngen | number | 1000000 |

Nruns | number | 2 |

Nchains | number | 4 |

Temp | number | 0.100000 |

Reweight | number,number | 0.00 v 0.00 ^ |

Swapfreq | number | 1 |

Nswaps | number | 1 |

Samplefreq | number | 500 |

Printfreq | number | 500 |

Printall | Yes/No | Yes |

Printmax | number | 8 |

Mcmcdiagn | Yes/No | Yes |

Diagnfreq | number | 5000 |

Diagnstat | Avgstddev/Maxstddev | Avgstddev |

Minpartfreq | number | 0.10 |

Allchains | Yes/No | No |

Allcomps | Yes/No | No |

Relburnin | Yes/No | Yes |

Burnin | number | 0 |

Burninfrac | number | 0.25 |

Stoprule | Yes/No | No |

Stopval | number | 0.05 |

Savetrees | Yes/No | No |

Checkpoint | Yes/No | Yes |

Checkfreq | number | 100000 |

Filename | name | primates.nex |

Startparams | Current/Reset | Current |

Starttree | Current/Random/Parsimony | Current |

Nperts | number | 0 |

Data | Yes/No | Yes |

Ordertaxa | Yes/No | No |

Append | Yes/No | No |

Autotune | Yes/No | Yes |

Tunefreq | number | 100 |

*Example*

`mcmc ngen=1100000 samplefreq=200 printfreq=200 diagnfreq=2000`

If the standard deviation of split frequencies is below 0.01 the specific number of generations, stop the run by answering no when the program asks `Continue the analysis? (yes/no)`

. Otherwise, keep adding generations until the value falls below `0.01`

. If you are interested mainly in the well-supported parts of the tree, a standard deviation below `0.05`

may be adequate.

Type `sump`

to summarize the parameter values using the same burn-in as the diagnostics in the mcmc command. The program will output a table with summaries of the samples of the substitution model parameters, including the mean, mode, and 95 % credibility interval (region of Highest Posterior Density, HPD) of each parameter. Make sure that the potential scale reduction factor (PSRF) is reasonably close to 1.0 for all parameters; if not, you need to run the analysis longer.

Summarize the trees using the same burn-in as the mcmc command by typing `sumt`

. The program will output a cladogram with the posterior probabilities for each split (file ending `.trprobs`

) and a phylogram with mean branch lengths (file ending `.con.tre`

). Both trees will also be printed to a file that can be read by FigTree and other tree-drawing programs, such as `TreeView`

and `Mesquite`

.

It does not have to be more complicated than this; however, as you get more proficient you will probably want to know more about what is happening behind the scenes. The rest of this section explains each of the steps in more detail and introduces you to all the implicit assumptions you are making and the machinery that MrBayes uses in order to perform your analysis.

### Altering the model

**GTR**

GTR `lset nst=6`

GTR + G `lset nst=6 rates=gamma`

GTR + I `lset nst=6 rates=inv`

GTR + G + I `lset nst=6 rates=gammainv`

**K2P**

Use the HKY model and set the stationary state frequencies to be equal, this will give you a K2P model

K2P `lset nst=2`

`prset statefreqpr=fixed(equal)`

K2P + G `lset nst=2 rates=gamma`

`prset statefreqpr=fixed(equal)`

K2P + I `lset nst=2 rates=inv`

`prset statefreqpr=fixed(equal)`

K2P + G + I `lset nst=2 rates=invgamma`

`prset statefreqpr=fixed(equal)`

**HKY**

HKY `lset nst=2`

HKY + G `lset nst=2 rates=gamma`

HKY + I `lset nst=2 rates=inv`

HKY + G + I `lset nst=2 rates=gammainv`

## Web server

You can also use MrBayes via a web server, There is a variety to choose, check their limitations with regards to compute power to ensure it is going to work for you.

Weâ€™ll use the sever by MABL, MrBayes 3.2.6 for this example. Some limitations of this web server are:

- Max number of sequences = 30
- Max number of generations for MCMC analysis = 100,000

The interface looks like this:

### 1. Upload data

Upload your alignment either by pasting into box provided or uploading file. File formats accepted `FASTA`

, `Phylip`

, `Clustal`

, `EMBL`

or `NEXUS`

.

### 2. Setting

Select your desired settings. If you are unsure below is a suggestions of where to start for analysing nucleotide data.

Likelihood model

- Number of substitution types:
`6 (GTR)`

- Substitution model:
`4by4`

- Rates variation across sites:
`invgamma`

Markov Chain Monte Carlo parameters

- Number of generations:
`100 000`

- Sample a tree every
`100`

generations

Summarize results

- Discard first
`500`

trees sampled (burnin)

### 3. Submit

Enter your email to receive a notification when analysis is complete and click `Submit`

### 4. Output

Click on the link in the email and it will take you to this screen

Download the tree by right clicking on `Tree in Newick format`

under the heading **Outputs** and select `Save link as...`

with the `.tree`

file format. You can open the file in something like FigTree, node/branch labels reflect the poster probabilities of the analysis.

The analysis information is available by downloading the `MrBayes logs`

file under the **Outputs** heading.