2013

CSHL Quantitive Biology


IFX0a. High level intro to genome assembly

Introduction to de Bruijn graphs for genome assembly

IFX0b. CSHL Inhouse

De novo assembly with long PacBio reads, detection of indels with Scalpel

IFX1. Computational Thinking: Sorting, Searching, and Indexing

Introduction to binary search, suffix arrays, hashing, and the BWT

IFX1b. Notes on the BWT

Scribed notes on computing and searching with the BWT

IFX2. Dynamic Programming: LIS and Sequence Alignment

Application of dynamic programming for sequence alignment: longest increasing subsequence, edit distance, sequence similarity, BLAST, Dynamic Time-warping

IFX2b. Notes on Dynamic Programming

Scribed notes on dynamic programming

IFX3. Graphs and Genomes

Algorithms for graph searching, detailed look at genome assembly.

IFX4. Gene Finding and HMMs

Approaches to prokayotic and eukaryotic gene finding, hidden markov models, forward algorithm, viterbi algorithm.



CSHL Quantitative Biology Bootcamp


Lecture 1: Biology, Computers, and Python

Milestones in Molecular Biology and the rise of sequencing. Overview of computer systems, introduction to python and scripting.

Lecture 2: Sequence Alignment and Computational Thinking

Introduction to Alignment and Algorithms, Suffix Arrays, Binary Search

Lecture 3: Genomic Resources

NCBI, UCSC, CSHL Meetings and Courses, Galaxy.

Lecture 4: Unix Scripting

Introduction to Unix, searching the human genome annotation

Lecture 5: Dicovering Origins of Replication

Problem by Justin Kinney. Background on tracking DNA replication with next-gen sequencing, Walk-through of analysis steps, Visualization of discovered replication sites.

Lecture 6: Advanced Origins of Replication Analyis [iPython Notebook]

Problem by Justin Kinney. Plotting, smoothing, and analyzying data

Lecture 7: Transcription Factor Binding Sites [iPython Notebook]

Problem by Justin Kinney. Parsing, and discovering transcription factor binding sites

Python Exercises

Exercises on working with python

Python Exercise Solutions

Solutions to exercises



CSHL Advanced Sequencing Course


Course Wiki

Schedule and archive of presentations.

Whole Genome Assembly and Alignment

De novo assembly theory and practice; whole genome alignment with MUMmer

Assembly Tutorial

Assembly tutorial to detect a secret message embedded into a microbial genome

AdvSeq.asm.tgz

Data for assembly tutorial



CSHL Programming for Biology


Whole Genome Assembly and Alignment

De novo assembly theory and practice; whole genome alignment with MUMmer

Assembly Tutorial

Assembly tutorial to detect a secret message embedded into a microbial genome

P4B.asm.challenge.tgz

Data for assembly tutorial



CSHL Undergraduate Research Program in Bioinformatics


Lecture 1: Sequence Alignment and Computational Thinking

In this class we explored the problem of finding exact occurrences of a query sequence in a large genome or database of sequences. Under this theme, we started by analyzing the brute force approach introducing the concepts of algorithm, complexity analysis, and E-values. Next we discussed suffix arrays as an index for accelerating the search, including analyzing the performance of binary search. We also considered two traditional algorithms for sorting (Selection Sort versus QuickSort) and their relative performance. In the second half of the class we discussed finding approximate occurrences of a short query sequence in a large genome or database of sequences. We first defined the problem by considering various metrics of an approximate occurrence such as hamming distance, or edit distance. We then considered different methods for computing inexact alignments including brute force global & local alignments, and seed-and-extend algorithms. Finally we discussed Bowtie as a Burrows-Wheeler transform based short read mapping algorithm for discovering alignments to reference genome.


Lecture 2: Sequencing Pitfalls

In this session we reviewed the currently available sequencing technologies and best practices, focusing on the widely used Illumina sequencing platform, the up and coming PacBio sequencing platform, and the recently announced Oxford Nanopore instruments. Special attention was placed on the complexities and biases with Illumina.


Lecture 3: Graphs and Genomes

The theme of this class was graphs and methods for graph analysis. The emphasis was on genome assembly but included a discussion of other biological networks including PPI networks, regulation networks, neuron interaction networks, and cell cycle graphs. In the class, we considered fundamental properties of graphs, such as nodes, edges, degrees, and shortest paths. We then examined in detail algorithms for searching graphs with a with breadth-first-search, and then approaches for finding minimum cost paths through weighed graphs (traveling salesman problem), including exhaustive search, greedy algorithms, and branch-and-bound. This lead to a discussion of the intractable nature of NP-complete problems, and reviewed several important examples (vertex cover, clique finding, knapsack problem, Hamiltonian cycle).



SBU Graduate Genetics


Next gen sequence analysis

Rise of sequencing; brute-force matching, binary search, genetics of autism.

Whole genome assembly

Review of *-seq assays, assembly theory, ALLPATHS-LG, Celera Assembler, PacBio.



SBU Intro to Computational Biology


Next-gen sequence analysis

Rise of sequencing; alignment with the BWT; genetic of autism

Lecture notes on the BWT

BWT construction, unwinding, exact match



SBU Introduction to Physical and Quantitative Biology


Sequence Alignment and Computational Thinking

Milestones in Molecular Biology and the rise of sequencing. Algorithms for searching and aligning sequences