Oxford Nanopore Sequencing, Hybrid Error Correction, and de novo Assembly of a Eukaryotic Genome

Sara Goodwin, James Gurtowski, Scott Ethe-Sayers, Panchu Deshpande, Michael C. Schatz, W. Richard McCombie

Monitoring the progress of DNA molecules through a membrane pore has been postulated as a method for sequencing DNA for several decades. Recently, a nanopore-based sequencing instrument, the Oxford Nanopore MinION, has become available that we used for sequencing the S. cerevisiae genome. To make use of these data, we developed a novel open-source hybrid error correction algorithm Nanocorr (https://github.com/jgurtowski/nanocorr) specifically for Oxford Nanopore reads, as existing packages were incapable of assembling the long read lengths (5-50kbp) at such high error rate (between ~5 and 40% error). With this new method we were able to perform a hybrid error correction of the nanopore reads using complementary MiSeq data and produce a de novo assembly that is highly contiguous and accurate: the contig N50 length is more than ten-times greater than an Illumina-only assembly (678kb versus 59.9kbp), and has greater than 99.88% consensus identity when compared to the reference. Furthermore, the assembly with the long nanopore reads presents a much more complete representation of the features of the genome and correctly assembles gene cassettes, rRNAs, transposable elements, and other genomic features that were almost entirely absent in the Illumina-only assembly.

 Paper Preprint
 Supplementary Tables and Figures
 Nanocorr: Hybrid Error Correction Pipeline for nanopore reads

S. cerevisae W303 - Oxford Nanopore Data & Assembly

 Reference Genome: S288C Reference Genome
 Oxford Nanopore raw reads: W303_ONT_Raw_reads.fa.gz
 Nanocorr corrected reads: W303_ONT_Nanocorr_Corrected_independent_reads.fa.gz
 Nanocorr assembly (polished): W303_ONT_Assembly.fa.gz
 Nanocorr assembly (raw): W303_ONT_Assembly_pre_pilon.fa.gz
 Nanocorr CA spec file W303_nanopore.spec

S. cerevisae W303 - Illumina MiSeq Data & Assembly

 Illumina raw reads source: Illumina BaseSpace
 Illumina raw reads (r1): W303_Miseq_R1.fastq.gz
 Illumina raw reads (r2): W303_Miseq_R2.fastq.gz
 Illumina Flashed Reads: Miseq_Flashed_25x.fastq.gz
 Illumina Assembly: W303_Miseq_Assembly.fa.gz
 Illumina CA spec file: W303_miseq.spec

E. coli K12 - Oxford Nanopore Data & Assembly

 Reference Genome: E. coli K12 MG1655 in NCBI
 Oxford Nanopore raw reads: Available in GigaDB
 Nanocorr corrected reads: ecoli_ONT_Nanocorr_Corrected_reads.fa.gz
 Nanocorr assembly: ecoli_ONT_Assembly_CA.fa.gz
 Nanocorr CA spec file ecoli_ONT.spec

E. coli K12 - Illumina MiSeq Data & Assembly

 Illumina raw reads: Available from Illumina [mirror R1 and R2]
 Illumina Flashed Reads: ecoli_Miseq_Flashed.fa.gz
 Illumina Assembly: ecoli_Miseq_Assembly.fa.gz
 Illumina CA spec file ecoli_Miseq.spec