Genome Assembly Class

An 8 part lecture series given at the University of Hawaii between August 13 - 18 2006. The lecture series covers the entire assembly process, from sequencing reactions, to assembly, and finishing. The discussion begins with an overview of the assembly process, and its theoretical foundations of Lander-Waterman statistics and Shortest-Common-Superstring. Next there is an indepth discussion of the Celera Assembler, covering the details of overlapping, unitigging, and scaffolding. Next an Introduction to AMOS is given describing the motivation, framework, and a brief discussion of some of the currently available tools. Lecture 5 discusses current methods to discover mis-assemblies and the Interactive Genome Visual Analytics tool Hawkeye, which acts as a visual portal to understanding and validating your assembly data. Next, I discuss two common problems in assembly, that of base calling and trimming and describe AutoEditor and AutoJoiner which are second generation assembly tools to address these areas. Lecture 6 is provided by Adam Phillippy and covers all aspects of Whole Genome Alignment, centered around the MUMmer suite. The following lecture, also by Adam Phillippy, describes the AMOScmp Comparative Assembler which uses MUMmer to assemble genomes without the costly overlapping step even at extremely low coverage. The Final lecture acts as a summary for the class, and a checklist for potential problem areas one might encounter during whole genome assembly.

1. Genome Assembly: Assembly Concepts and Methods: Lander-Waterman Statistics, Shortest-Common-Superstring
2. Celera Assembler: Theory and Practice: runCA, overlapper, unitigging, scaffolding
3. AMOS: A Modular Open Source Assembler: AMOS overview, runAMOS, AMOS banks, Converters
4. AMOS Assembly Validation and Visualization: Mate-pairs, SNPs, Coverage levels, Hawkeye, stitchContigs, Assembly Repair
5. Improving Assembly without Sequencing: Basecalling, AutoEditor, Trimming, AutoJoiner
6. Whole Genome Alignment: Alignment, Smith-Waterman, MUMmer, Suffix Trees
7. Comparative Genome Assembly: AMOScmp, MUMmer, reference assembly
8. Assembly Checklist: Sequencing, Libraries, Biases, Coverage, Unitigging, Scaffolding