Ebola virus bioinformatics protocol
Nanopore | bioinformatics
|Licence:||Creative Commons Attribution 4.0 International License|
- This document is part of the Ebola virus Nanopore sequencing protocol package:
- Ebola virus Nanopore sequencing protocol:
- Setting up the laptop computing environment using Conda:
- Phylogenetic analysis and visualization:
Set up the computing environment as described here in this document: ebov-it-setup. This should be done and tested prior to sequencing, particularly if this will be done in an environment without internet access or where this is slow or unreliable. Once this is done, the bioinformatics can be performed largely off-line.
Activate the ARTIC environment:
source activate artic-ebov
Basecalling with Albacore (MinION on laptop)
Run the Albacore basecaller on the new MinION run folder:
read_fast5_basecaller.py -c r94_450bps_linear.cfg -i /path/to/reads -s run_name -o fastq -t 4 -r --barcoding
You need to substitute
/path/to/reads to the folder where the FAST5 files from your
run are. Common locations are:
This will create a folder called
run_name with the base-called reads in it.
Consensus sequence generation
Gather up the FASTQ output from Albacore:
artic gather --min-length 400 --max-length 700 --prefix run_name basecalled_reads
basecalled_reads should be the folder in which Albacore put the base-called reads (i.e.,
run_name from the command above).
We use a length filter here of between 400 and 700 to remove obviously chimeric reads.
Basecalling using MinIT or GridION
If running on MinIT or GridION and you have used Guppy to basecall through Dogfish, instead you can do:
artic gather --guppy --min-length 400 --max-length 700 --prefix run_name /data/basecalled/path/to/reads
You will now have a file called:
and a file called
as well as individual files for each barcode (if previously demultiplexed).
Demultiplex with Porechop with stringent settings
This stage is obligatory, even if you have already demultiplexed with Albacore, due to significant barcoding misassignments that can confound results:
artic demultiplex --threads 4 run_name_all.fastq
Now you will have new files called:
run_name_all_BC01.fastq run_name_all_BC02.fastq run_name_all_BC03.fastq
Create the nanopolish index (once per sequencing run, not per sample)
nanopolish index -s run_name_sequencing_summary.txt -d /path/to/reads run_name_all.fastq
/path/to/reads to point to the original location of the FAST5 files.
Run the MinION pipeline
For each barcode you wish to process:
artic minion --normalise 200 --threads 4 --scheme-directory artic-ebov/primer-schemes --read-file run_name_final_NB01.fastq --nanopolish-read-file run_name_all.fastq ZaireEbola/V2 samplename
samplename as appropriate:
samplename.primertrimmed.bam- BAM file for visualisation after primer-binding site trimming
samplename.vcf- detected variants in VCF format
samplename.variants.tab- detected variants
samplename.consensus.fasta- consensus sequence