Schatz Laboratory

New whole genome de novo assemblies of three divergent strains of rice (O. sativa) documents novel gene space of aus and indica

Schatz MC, Maron LG, Stein JC, Hernandez Wences A, Gurtowski J, Biggers E, Lee H, Kramer M, Antoniou E, Ghiban E, Wright MH, Chia JM, Ware D, McCouch SR† and McCombie WR†

The use of high throughput genome-sequencing technologies has uncovered a large extent of structural variation in eukaryotic genomes that makes important contributions to genomic diversity and phenotypic variation. Currently, when the genomes of different strains of a given organism are compared, whole genome resequencing data are aligned to an established reference sequence. However when the reference differs in significant structural ways from the individuals under study, the analysis is often incomplete or inaccurate. Here, we use rice as a model to explore the extent of structural variation among strains adapted to different ecologies and geographies, and show that this variation can be significant, often matching or exceeding the variation present in closely related human populations or other mammals. We demonstrate how improvements in sequencing and assembly technology allow rapid and inexpensive de novo assembly of next generation sequence data into high-quality assemblies that can be directly compared to provide an unbiased assessment. Using this approach, we are able to accurately assess the “pan-genome” of three divergent rice varieties and document several megabases of each genome absent in the other two. Many of the genome-specific loci are annotated to contain genes, reflecting the potential for new biological properties that would be missed by standard resequencing approaches. We further provide a detailed analysis of several loci associated with agriculturally important traits, illustrating the utility of our approach for biological discovery. All of the data and software are openly available to support further breeding and functional studies of rice and other species.

This page contains pointers to the raw sequencing data, assemblies, annotations, and pan-genome alignment for Nipponbare (Os-Nipponbare-Draft-CSHL-1.0), IR64 (Os-IR64-Draft-CSHL-1.0), and DJ123 (Os-DJ123-Draft-CSHL-1.0) assemblies. The sequencing data, assemblies, and annotations use standard file formats. The pan-genome alignment information is recorded in BED format: each base of the assembly is classified as specific to that genome or shared by one or two of the other assemblies as according to nucmer. Bases that are 'N's are separately tagged. Note the annotations and pan genome alignmnents were computed from the scaffolds file, while the fsa and agp files have filtered short contigs.

	Paper Preprint
	Supplementary Tables and Figures
	Putative Strain Specific Gene Characteristics

Nipponbare

Os-Nipponbare-Draft-CSHL-1.0
March 31, 2014

	scaffolds (.fa.gz)
	annotation (.gff.gz)
	pan genome alignments (.bed.gz)
	contigs (.fsa.gz)
	agp (.agp.gz)

Reads in SRA

library type	read length	Accession
150bp fragment	2x76	SRX032913
180bp fragment	2x100	SRX734432
300bp fragment	2x100	SRX179254
300bp fragment	2x100	SRX179242
2kb jump	2x50	SRX179260
2kb jump	2x36	SRX179259
2kb jump	2x36	SRX179258
5kbp jump	2x50	SRX179265
MiSeq pe 450	2x250	SRX179262

Search SRA for CSHL+Nipponbare

IR64

Os-IR64-Draft-CSHL-1.0
March 31, 2014

	scaffolds (.fa.gz)
	annotation (.gff.gz)
	pan genome alignments (.bed.gz)
	contigs (.fsa.gz)
	agp (.agp.gz)

Reads in SRA

library type	read length	Accession
180bp fragment	2x100	SRX180537
300bp fragment	2x100	SRX180556
300bp fragment	2x100	SRX180557
2kb jump	2x50	SRX180555
2kb jump	2x36	SRX180538
5kbp jump	2x50	SRX180597
MiSeq pe 450	2x250	SRX180591

Search SRA for CSHL+IR64

DJ123

Os-DJ123-Draft-CSHL-1.0
March 31, 2014

	scaffolds (.fa.gz)
	annotation (.gff.gz)
	pan genome alignments (.bed.gz)
	contigs (.fsa.gz)
	agp (.agp.gz)

Reads in SRA

library type	read length	Accession
180bp fragment	2x100	SRX180718
300bp fragment	2x100	SRX180752
300bp fragment	2x100	SRX180754
2kb jump	2x50	SRX180822
2kb jump	2x36	SRX180755
5kbp jump	2x50	SRX180892
MiSeq pe 450	2x250	SRX186093

Search SRA for CSHL+DJ123

Schatz MC, Maron LG, Stein JC, Hernandez Wences A, Gurtowski J, Biggers E, Lee H, Kramer M, Antoniou E, Ghiban E, Wright MH, Chia JM, Ware D, McCouch SR† and McCombie WR†

Os-Nipponbare-Draft-CSHL-1.0 March 31, 2014

Reads in SRA

Os-IR64-Draft-CSHL-1.0 March 31, 2014

Reads in SRA

Os-DJ123-Draft-CSHL-1.0 March 31, 2014

Reads in SRA

Os-Nipponbare-Draft-CSHL-1.0
March 31, 2014

Os-IR64-Draft-CSHL-1.0
March 31, 2014

Os-DJ123-Draft-CSHL-1.0
March 31, 2014