Recent advances in DNA sequencing technology are dramatically changing the scale and scope of modern molecular biology. Next generation sequencing instruments can sequence the equivalent of the human genome in a few days and at low cost, compared to the years of effort and billions of dollars spent to sequence the first human genome. This dramatic increase in efficiency has spurred tremendous growth in applications for DNA sequencing. For example, whereas the human genome project sought to sequence the genome of a small group of individuals, the 1000 genomes projects aims to catalog the genomes of more than 1000 individuals from all over the globe.

Our research focuses on the development of scalable algorithms and systems to analyze DNA sequences, concentrating on the assembly and alignment of next generation sequencing reads, and related downstream analyses. These systems have been used to reconstruct the genomes of previously unsequenced organisms, probe sequence variations, and to explore a host of biological features across the tree of life. Moving forward, one of the main challenges facing computational biologists is the creation of analysis systems whose efficiency can match the dramatic improvements in sequencing throughput. As such, we are particularly interested in capitalizing on the latest advances in distributed and parallel computing to advance the state of the art in bioinformatics and genomics.

