New whole genome de novo assemblies of three divergent strains of rice (O. sativa) documents novel gene space of aus and indica

Schatz MC, Maron LG, Stein JC, Hernandez Wences A, Gurtowski J, Biggers E, Lee H, Kramer M, Antoniou E, Ghiban E, Wright MH, Chia JM, Ware D, McCouch SR† and McCombie WR†

The use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. Currently, when the genomes of different strains of a given organism are compared, whole genome resequencing data are aligned to an established reference sequence. However when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate. Here, we use rice as a model to explore the extent of structural variation among strains adapted to different ecologies and geographies, and show that this variation can be significant, often matching or exceeding the variation present in closely related human populations or other mammals. We demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared to provide an unbiased assessment. Using this approach, we are able to accurately assess the “pan-genome” of three divergent rice varieties and document several megabases of each genome absent in the other two. Many of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard resequencing approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species.

This page contains pointers to the raw sequencing data, assemblies, annotations, and pan-genome alignment for Nipponbare (Os-Nipponbare-Draft-CSHL-1.0), IR64 (Os-IR64-Draft-CSHL-1.0), and DJ123 (Os-DJ123-Draft-CSHL-1.0) assemblies. The sequencing data, assemblies, and annotations use standard file formats. The pan-genome alignment information is recorded in BED format: each base of the assembly is classified as specific to that genome or shared by one or two of the other assemblies as according to nucmer. Bases that are 'N's are separately tagged. Note the annotations and pan genome alignmnents were computed from the scaffolds file, while the fsa and agp files have filtered short contigs.

 Paper Preprint
 Supplementary Tables and Figures
 Putative Strain Specific Gene Characteristics


Nipponbare

Os-Nipponbare-Draft-CSHL-1.0
March 31, 2014

 scaffolds (.fa.gz)
 annotation (.gff.gz)
 pan genome alignments (.bed.gz)
 contigs (.fsa.gz)
 agp (.agp.gz)

Reads in SRA

 library type  read length Accession
 150bp fragment  2x76 SRX032913
 180bp fragment  2x100 SRX734432
 300bp fragment  2x100 SRX179254
 300bp fragment  2x100 SRX179242
 2kb jump  2x50 SRX179260
 2kb jump  2x36 SRX179259
 2kb jump  2x36 SRX179258
 5kbp jump  2x50 SRX179265
 MiSeq pe 450  2x250 SRX179262

Search SRA for CSHL+Nipponbare



IR64

Os-IR64-Draft-CSHL-1.0
March 31, 2014

 scaffolds (.fa.gz)
 annotation (.gff.gz)
 pan genome alignments (.bed.gz)
 contigs (.fsa.gz)
 agp (.agp.gz)

Reads in SRA

 library type  read length Accession
 180bp fragment  2x100 SRX180537
 300bp fragment  2x100 SRX180556
 300bp fragment  2x100 SRX180557
 2kb jump  2x50 SRX180555
 2kb jump  2x36 SRX180538
 5kbp jump  2x50 SRX180597
 MiSeq pe 450  2x250 SRX180591

Search SRA for CSHL+IR64



DJ123

Os-DJ123-Draft-CSHL-1.0
March 31, 2014

 scaffolds (.fa.gz)
 annotation (.gff.gz)
 pan genome alignments (.bed.gz)
 contigs (.fsa.gz)
 agp (.agp.gz)

Reads in SRA

 library type  read length Accession
 180bp fragment  2x100 SRX180718
 300bp fragment  2x100 SRX180752
 300bp fragment  2x100 SRX180754
 2kb jump  2x50 SRX180822
 2kb jump  2x36 SRX180755
 5kbp jump  2x50 SRX180892
 MiSeq pe 450  2x250 SRX186093

Search SRA for CSHL+DJ123