New whole genome de novo assemblies of three divergent strains of rice (O. sativa) documents novel gene space of aus and indica
Schatz MC, Maron LG, Stein JC, Hernandez Wences A, Gurtowski J, Biggers E, Lee H, Kramer M, Antoniou E, Ghiban E, Wright MH, Chia JM, Ware D, McCouch SR† and McCombie WR†
The use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. Currently, when the genomes of different strains of a given organism are compared, whole genome resequencing data are aligned to an established reference sequence. However when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate. Here, we use rice as a model to explore the extent of structural variation among strains adapted to different ecologies and geographies, and show that this variation can be significant, often matching or exceeding the variation present in closely related human populations or other mammals. We demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared to provide an unbiased assessment. Using this approach, we are able to accurately assess the “pan-genome” of three divergent rice varieties and document several megabases of each genome absent in the other two. Many of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard resequencing approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species.
This page contains pointers to the raw sequencing data, assemblies, annotations, and pan-genome alignment for
Nipponbare (Os-Nipponbare-Draft-CSHL-1.0), IR64 (Os-IR64-Draft-CSHL-1.0), and DJ123 (Os-DJ123-Draft-CSHL-1.0) assemblies.
The sequencing data, assemblies, and annotations use standard file formats. The pan-genome alignment information
is recorded in BED format: each base of the assembly is classified as specific to that genome or shared by one or
two of the other assemblies as according to nucmer. Bases that are 'N's are separately tagged. Note the annotations
and pan genome alignmnents were computed from the scaffolds file, while the fsa and agp files have filtered short contigs.
Nipponbare
Os-Nipponbare-Draft-CSHL-1.0 March 31, 2014
Reads in SRA
Search SRA for CSHL+Nipponbare
IR64
Os-IR64-Draft-CSHL-1.0 March 31, 2014
Reads in SRA
Search SRA for CSHL+IR64
DJ123
Os-DJ123-Draft-CSHL-1.0 March 31, 2014
Reads in SRA
Search SRA for CSHL+DJ123
|