Ebola virus phylogenetic analysis protocol
Nanopore | bioinformatics
|Licence:||Creative Commons Attribution 4.0 International License|
- This document is part of the Ebola virus Nanopore sequencing protocol package:
- Ebola virus Nanopore sequencing protocol:
- Setting up the laptop computing environment using Conda:
- Ebola virus Nanopore bioinformatics protocol:
Set up the computing environment as described in this document: ebov-it-setup
This protocol also assumes that the setup and installation of the bioinformatics protocol has been performed as described in this document: ebov-bioinformatics-sop .
Activate the ARTIC Conda environment:
source activate artic-ebov
An alignment of 35 complete or nearly-complete genomes spaning 1976-2014 is available. This has representitives from many of the Middle Africa outbreaks in DRC, Gabon and Republic of Congo and 3 from the West African outbreak in Guinea in 2014. This set is intended to provide a framework in order to place new, previously uncharacterised, outbreak sequences.
This is a FASTA format file which contains a multiple alignment of the 35 genomes.
To test these instructions you can find a synthetic genome in the ARTIC repository. This file, called
fake_ebola_genome.fasta is an artificial genome sequence constructed to fall within the diversity of the 35 provided reference genomes to simulate the discovery of a new lineage of EBOV. This is for testing only and shouldn’t be included in any analyses.
Building a multiple alignment
Use MUSCLE multiple alignment software to align the new genome consensus sequences to the existing reference genome alignment:
muscle -profile -in1 ebov-reference-genomes-35.fasta -in2 new_genomes.fasta -fastaout aligned.afa
This methods keeps the existing alignment and pair-wise aligns the new sequence to it.
profileoption is much quicker than doing a full multiple alignment but could be problematic if the new genome is divergent from all the reference genomes. It may be worth doing a full re-alignment.
- Optional step – To re-align an existing alignment:
muscle -in aligned.afa -out re-aligned.afa -refine
Inferring a phylogenetic tree
goalign reformat phylip -i aligned.afa > aligned.phy
Then build the tree. This will use the default nucleotide model (HKY with gamma distributed site rate heterogeneity):
phyml --input aligned.phy --datatype nt
The output goes into two files:
aligned.phy_phyml_stats.txt provides all the estimated parameter values and other information,
aligned.phy_phyml_tree.txt is the resulting tree in NEWICK format.
By default an ML tree is arbitrarily rooted so to help with the interpretation of the tree, so use the Gotree utility to re-root the tree so the 1970s viruses are at the root:
gotree reroot outgroup -i aligned.phy_phyml_tree.txt 'KC242791|Bonduni|DRC|1977-06' 'KC242801|deRoover|DRC|1976' 'KM655246|Yambuku-Ecran|DRC|1976' > rooted.tree
ETE3 can be used to open a window to view the resulting tree:
ete3 view -t rooted.tree
EBOV phylogenetics with augur and visualization with auspice
The augur package provides a light-weight wrapper around common phylogenetics functionality like aligning sequences and building phylogenetic trees. Here we use augur to quickly align sequences, build a tree and estimate a temporally dated phylogeny.
Begin by navigating to the
augur/ directory within the
The full augur build can be performed by running the supplied
This outputs individual
augur commands such as
augur align and
augur tree as they are run. Please modify the
Snakefile to customize the phylogenetic build.
In the course of running the augur build, a maximum-likelihood tree is produced as
results/tree_raw.nwk and a time-resolved tree is produced as
Additionally, running augur enables visualization by auspice through the
augur export command, which has produced the files
auspice/ebov_tree.json. To view the visualization, first install auspice by running:
npm install -g auspice
Then launch the visualization viewer by running:
auspice --data auspice/
You can then navigate to http://localhost:4000/local/ebov to view the interactive phylogeny.