Ebola virus phylogenetic analysis protocol

Nanopore | bioinformatics

Document: ARTIC-EBOV-phylogeneticsSOP-v1.0.0
Creation Date: 2018-05-26
Author: Andrew Rambaut
Licence: Creative Commons Attribution 4.0 International License
Overview: An analysis protocol for an initial phylogenetic analysis of consensus genomes. Includes alignment, phylogeny estimation and visualization.


This document is part of the Ebola virus Nanopore sequencing protocol package:
http://artic.network/ebov/
Ebola virus Nanopore sequencing protocol:
http://artic.network/ebov/ebov-seq-sop.html
Setting up the laptop computing environment using Conda:
http://artic.network/ebov/ebov-it-setup.html
Ebola virus Nanopore bioinformatics protocol:
http://artic.network/ebov/ebov-bioinformatics-sop.html




Funded by the Wellcome Trust
Collaborators Award 206298/Z/17/Z --- ARTIC network

Preparation

Set up the computing environment as described in this document: ebov-it-setup

This protocol also assumes that the setup and installation of the bioinformatics protocol has been performed as described in this document: ebov-bioinformatics-sop .

Installing software

Activate the ARTIC Conda environment:

source activate artic-ebov

Reference genomes

An alignment of 35 complete or nearly-complete genomes spaning 1976-2014 is available. This has representitives from many of the Middle Africa outbreaks in DRC, Gabon and Republic of Congo and 3 from the West African outbreak in Guinea in 2014. This set is intended to provide a framework in order to place new, previously uncharacterised, outbreak sequences.

35 EBOV genome alignment - FASTA file.

This is a FASTA format file which contains a multiple alignment of the 35 genomes.

Test data

To test these instructions you can find a synthetic genome in the ARTIC repository. This file, called fake_ebola_genome.fasta is an artificial genome sequence constructed to fall within the diversity of the 35 provided reference genomes to simulate the discovery of a new lineage of EBOV. This is for testing only and shouldn’t be included in any analyses.

Building a multiple alignment

Use MUSCLE multiple alignment software to align the new genome consensus sequences to the existing reference genome alignment:

muscle -profile -in1 ebov-reference-genomes-35.fasta -in2 new_genomes.fasta -fastaout aligned.afa

This methods keeps the existing alignment and pair-wise aligns the new sequence to it.

Note: The profile option is much quicker than doing a full multiple alignment but could be problematic if the new genome is divergent from all the reference genomes. It may be worth doing a full re-alignment.

  • Optional step – To re-align an existing alignment:
    muscle -in aligned.afa -out re-aligned.afa -refine
    

Inferring a phylogenetic tree

We will infer a phylogenetic tree using maximum likelihood (ML) with PhyML. This program uses the PHYLIP alignment format and we can use the Goalign utility to convert from FASTA format:

goalign reformat phylip -i aligned.afa > aligned.phy

Then build the tree. This will use the default nucleotide model (HKY with gamma distributed site rate heterogeneity):

phyml --input aligned.phy --datatype nt

The output goes into two files: aligned.phy_phyml_stats.txt provides all the estimated parameter values and other information, aligned.phy_phyml_tree.txt is the resulting tree in NEWICK format.

By default an ML tree is arbitrarily rooted so to help with the interpretation of the tree, so use the Gotree utility to re-root the tree so the 1970s viruses are at the root:

gotree reroot outgroup -i aligned.phy_phyml_tree.txt 'KC242791|Bonduni|DRC|1977-06' 'KC242801|deRoover|DRC|1976' 'KM655246|Yambuku-Ecran|DRC|1976' > rooted.tree

ETE3 can be used to open a window to view the resulting tree:

ete3 view -t rooted.tree

EBOV phylogenetics with augur and visualization with auspice

The augur package provides a light-weight wrapper around common phylogenetics functionality like aligning sequences and building phylogenetic trees. Here we use augur to quickly align sequences, build a tree and estimate a temporally dated phylogeny.

Begin by navigating to the augur/ directory within the artic-ebov repo:

cd augur

The full augur build can be performed by running the supplied Snakefile with:

snakemake -p

This outputs individual augur commands such as augur parse, augur align and augur tree as they are run. Please modify the Snakefile to customize the phylogenetic build.

In the course of running the augur build, a maximum-likelihood tree is produced as results/tree_raw.nwk and a time-resolved tree is produced as results/tree.nwk.

Additionally, running augur enables visualization by auspice through the augur export command, which has produced the files auspice/ebov_meta.json and auspice/ebov_tree.json. To view the visualization, first install auspice by running:

npm install -g auspice

Then launch the visualization viewer by running:

auspice --data auspice/

You can then navigate to http://localhost:4000/local/ebov to view the interactive phylogeny.