CSHL Advanced Sequencing Course

More information is available on the course Webpage

1. De novo assembly theory and practice

Theory and practice of assembly projects examining requirements for coverage, read lengths, and quality with a focus on Illumina and PacBio sequecning

2. De novo assembly exercise

A hands on exercise assembling a genome and searching for a secret message encoded along a novel insertion

3. Genome Annotations

Strategies for annotating a genome, including gene prediction, alignment techniques, and high throughput functional sequencing assays

4. Long Read Assembly

Latest advances in long read assembly. Presented by James Gurtowksi

5. Single Cell Analysis

Latest advances in single cell analysis. Presented by Tyler Garvin

CSHL WSBS Genomics and Quantitive Biology

Schatz Lab Research Projects

Overview of my background and research

QB Bootcamp

QB Bootcamp 1: Introduction

The challenges of biological data science

QB Bootcamp 2: Unix scripting

A gentle introduction to working at the command line

QB Bootcamp 3: ORC Exercises

Use sequencing to discover origins of replication


Python programming exercises


Shell programming exercises


Genomics 1: Omics Bootcamp

Introduction to whole genome, exome sequencing, RNA-seq, ChIP-seq, Methy-seq, and single cell analysis

Genomics 2: ENCODE

History and major findings of ENCODE

Genomics 3: Ancient and Modern Humans

Major results from 1000 genomes project, Neanderthal sequencing, surname inference

Quantitative Biology

QB Lecture 1. Exact Matching

Introduction to brute force, binary search, suffix arrays and the BWT

QB Lecture 1: BWT Notes

Lecture Notes on the BWT

QB Lecture 2: Dynamic Programming

Fibonnaci Numbers, Longest Increasing Subsequence, and Sequence Alignment

QB Lecture 2: Dynamic Programming Notes

Lecture notes on designing a dynamic programming algorithm

QB Lecture 3: Graphs and Genomes

Basic graph algorithms, methods for genome assembly

QB Lecture 4: Gene Finding and HMMs

Microbial and Eukaryotic Gene Finding, Markov Models, HMMs and GHMMs, Forward Algorithm, Viterbi

CSHL Frontiers and Techniques in Plant Science

Genome Sequencing and Assembly

Introduction to de Bruijn graphs for genome assembly

Assembly Tutorial

Assembly tutorial to detect a secret message embedded into a microbial genome

CSHL Undergraduate Research Program in Bioinformatics

Searching for GATTACA

In this class we explored the problem of finding exact occurrences of a query sequence in a large genome or database of sequences. Under this theme, we started by analyzing the brute force approach introducing the concepts of algorithm, complexity analysis, and E-values. Next we discussed suffix arrays as an index for accelerating the search, including analyzing the performance of binary search. We also considered two traditional algorithms for sorting (Selection Sort versus QuickSort) and their relative performance. In the second half of the class we discussed finding approximate occurrences of a short query sequence in a large genome or database of sequences. We first defined the problem by considering various metrics of an approximate occurrence such as hamming distance, or edit distance. We then considered different methods for computing inexact alignments including brute force global & local alignments, and seed-and-extend algorithms. Finally we discussed Bowtie as a Burrows-Wheeler transform based short read mapping algorithm for discovering alignments to reference genome.

Python & Bioinformatics

Python Class 1

Introduction to python, variables, lists, conditions, loops

Python Class 2

Brute force search, dictionaries, motif finding

iPython Notebooks for Probability & Statistics

  1. Rolling a die (Uniform Random Probability)
  2. Flipping a coin (Binomial & Normal Distributions)
  3. Throwing Marbles into Jars (Poisson Distribution)
  4. Throwing Darts (Exponential Distribution)

We also used the exercises at Rosalind throughout the course.

Special topics

Talk by Anne Churchland on balancing work and life.