首页 The genome of the mesopolyploid crop species Brassica rapa

The genome of the mesopolyploid crop species Brassica rapa

举报
开通vip

The genome of the mesopolyploid crop species Brassica rapa © 2 01 1 N at u re A m er ic a, In c. A ll ri g h ts r es er ve d . © 2 01 1 N at u re A m er ic a, In c. A ll ri g h ts r es er ve d . Nature GeNetics  VOLUME 43 | NUMBER 10 | OCTOBER 2011 1035 l e t t e r s Brass...

The genome of the mesopolyploid crop species Brassica rapa
© 2 01 1 N at u re A m er ic a, In c. A ll ri g h ts r es er ve d . © 2 01 1 N at u re A m er ic a, In c. A ll ri g h ts r es er ve d . Nature GeNetics  VOLUME 43 | NUMBER 10 | OCTOBER 2011 1035 l e t t e r s Brassica nigra (B genome) and B. oleracea (C genome) having formed the amphidiploid species B. juncea (A and B genomes), B. napus (A and C genomes) and B. carinata (B and C genomes) by hybridiza- tion. Comparative physical mapping studies have confirmed genome triplication in a common ancestor of B. oleracea11 and B. rapa12 since its divergence from the A. thaliana lineage at least 13–17 MYA6,7,13. Using 72× coverage of paired short read sequences generated by Illumina GA II technology and stringent assembly parameters, we assembled the genome of the B. rapa ssp. pekinensis line Chiifu-401-42 and analyzed the assembly (Online Methods and Supplementary Note). The final assembly statistics are summarized in Table 1. The assembled sequence of 283.8 Mb was estimated to cover >98% of the gene space (Supplementary Table 1) and is greater than the previous estimated size of the euchromatic space, 220 Mb14. The assembly showed excellent agreement with the previously reported chromosome A03 (ref. 15) and with 647 bacterial artificial chromosomes (BACs)14 (Online Methods) sequenced by Sanger technology. Integration with 199,452 BAC-end sequences produced 159 super scaffolds representing 90% of the assem- bled sequences, with an N50 scaffold (N50 scaffold is a weighted median statistic indicating that 50% of the entire assembly is contained in scaf- folds equal to or larger than this value) size of 1.97 Mb. Genetic mapping of 1,427 markers in B. rapa allowed us to produce ten pseudo chromo- somes that included 90% of the assembly (Supplementary Table 2). We found the difference in the physical sizes of the A. thaliana and B. rapa genomes to be largely because of transposable elements (Supplementary Table 3). Although widely dispersed throughout the genome, as shown in Figure 1, the transposon-related sequences were most abundant in the vicinity of the centromeres. We estimated that transposon-related sequences occupy 39.5% of the genome, with the proportions of retrotransposons (with long terminal repeats), DNA transposons and long interspersed elements being 27.1%, 3.2% and 2.8%, respectively (Supplementary Tables 4 and 5). We modeled and analyzed protein coding genes (described in the Online Methods and the Supplementary Note). We identified 41,174 protein coding genes, distributed as shown in Figure 1. The gene models have an average transcript length of 2,015 bp, a coding length of 1,172 bp and a mean of 5.03 exons per gene, both similar to that observed in A. thaliana16. A total of 95.8% of gene models have a match in at least one of the public protein databases and 99.3% are represented among the public EST collections or de novo Illumina mRNA-Seq data. Among the total 16,917 B. rapa gene families, only 1,003 (5.9%) appear to be lineage specific, with 15,725 (93.0%) shared with A. thaliana16 and 9,909 (58.6%) also shared by Carica papaya17 and Vitis vinifera18 (Fig. 2). The genome of the mesopolyploid crop species Brassica rapa The Brassica rapa Genome Sequencing Project Consortium We report the annotation and analysis of the draft genome  sequence of Brassica rapa accession Chiifu-401-42, a Chinese  cabbage. We modeled 41,174 protein coding genes in the   B. rapa genome, which has undergone genome triplication.   We used Arabidopsis thaliana as an outgroup for investigating  the consequences of genome triplication, such as structural  and functional evolution. The extent of gene loss (fractionation)  among triplicated genome segments varies, with one of the  three copies consistently retaining a disproportionately large  fraction of the genes expected to have been present in its  ancestor. Variation in the number of members of gene families  present in the genome may contribute to the remarkable  morphological plasticity of Brassica species. The B. rapa  genome sequence provides an important resource for studying  the evolution of polyploid genomes and underpins the genetic  improvement of Brassica oil and vegetable crops. Model species have provided valuable insights into angiosperm (flowering plant) genome structure, function and evolution. For example, A. thaliana has experienced two genome duplications since its divergence from Carica, with rapid DNA sequence divergence, extensive gene loss and fractionation of ancestral gene order eroding the resemblance of A. thaliana to ancestral Brassicales1. Compared with an ancestor at just a few million years ago, A. thaliana has undergone a ~30% reduction in genome size2 and 9–10 chromosomal rearrangements3,4 that differentiate it from its sister species Arabidopsis lyrata. Whole-genome duplication has been observed in all plant genomes sequenced to date. A. thaliana has undergone three paleo-polyploidy events5: a paleohexaploidy (γ) event shared with most dicots (asterids and rosids) and two paleotetraploidy events (β then α) shared with other members of the order Brassicales. B. rapa shares this complex history but with the addition of a whole- genome triplication (WGT) thought to have occurred between 13 and 17 million years ago (MYA)6,7, making ‘mesohexaploidy’ a characteristic of the Brassiceae tribe of the Brassicaceae8. Brassica crops are used for human nutrition and provide opportuni- ties for the study of genome evolution. These crops include important vegetables (B. rapa (Chinese cabbage, pak choi and turnip) and Brassica oleracea (broccoli, cabbage and cauliflower)) as well as oilseed crops (Brassica napus, B. rapa, Brassica juncea and Brassica carinata), which provide collectively 12% of the world’s edible vegetable oil production9. The six widely cultivated Brassica species are also a classical example of the importance of polyploidy in botanical evolution, described by ‘U’s triangle’10, with the three diploid species B. rapa (A genome), A full list of members appears at the end of the paper. Received 7 March 2011; accepted 3 August 2011; published online 28 August 2011; doi:10.1038/ng.919 © 2 01 1 N at u re A m er ic a, In c. A ll ri g h ts r es er ve d . © 2 01 1 N at u re A m er ic a, In c. A ll ri g h ts r es er ve d . 1036  VOLUME 43 | NUMBER 10 | OCTOBER 2011 Nature GeNetics l e t t e r s We analyzed the organization and evolution of the genome (as described in the Online Methods and the Supplementary Note). B. rapa’s close relationship to A. thaliana allows Arabidopsis to be used as an outgroup for investigating the adaptation of the Brassica lineage to the triplicated state. In total, 108.6 Mb (90.01%) of the A. thaliana genome and 259.6 Mb (91.13%) of the B. rapa genome assembly were contained within collinear blocks. We confirmed the almost complete triplication of the B. rapa genome relative to A. thaliana (Fig. 3) and (by inference) to the postulated Brassicaceae ancestral genome (n = 8). The gene paralogues anchored in the triplicated segments (Supplementary Fig. 1) and their orthologs (Supplementary Table 6) dated the meso- hexaploidy event to between 5 and 9 MYA (Supplementary Fig. 2), which is more recent than has been reported previously13. The Brassica mesohexaploidy offers an opportunity to study gene retention in triplicated genomes. Assuming an initial count of protein coding genes similar to that of A. thaliana (around 30,000), the newly formed hexaploid would have about 90,000 genes, of which we can now identify only 41,174. This is typical of the substantial gene loss that occurs following polyploid formation in eukaryotes19–21. We identified each of the orthologous blocks in the B. rapa genome corresponding to ancestral blocks using collinearity between orthologs on the genomes of B. rapa and A. thaliana and found significant disparity in gene loss across the triplicated blocks (Supplementary Fig. 3). Of the 21 regions of conserved synteny, 20 showed significant deviations from equivalent gene frequen- cies (P < 0.05) (Supplementary Fig. 4). To illustrate this variation, we concatenated the least fractionated blocks (LF), the medium fractionated blocks (MF1) and the most fractionated blocks (MF2) and calculated the proportions of genes retained in each of these sub-genomes relative to A. thaliana. The LF sub-genome retains 70% of the genes found in A. thaliana, whereas the MF1 and MF2 sub-genomes retain substantially lower proportions of retained genes (46% and 36%, respectively; Fig. 4). Based on the analysis of synonymous base substitution rates (Ks values), the pairwise divergences between the three sub-genomes are indistin- guishable from each other (Supplementary Table 7). Our observation of differentially fractionated sub-genomes is consistent with the hypothesis that the sub-genomes MF1 and MF2 underwent substantial fractiona- tion in a tetraploid nucleus before fractionation commenced in the LF genome in a more recently formed hexaploid. However, biased fractiona- tion following tetraploidy (albeit less extreme than we observed) has been reported in A. thaliana22 and maize23, where it was hypothesized to be the result of differential epigenetic marking of the parent genomes (resulting in differential gene silencing and consequential fraction), rep- resenting an alternative hypothesis. The retention of extensive collinear genome blocks provides a potential opportunity for ectopic DNA recombination. By finding and comparing homologous gene quartets, including two α or β duplicates in Brassica and their respective orthologs in Arabidopsis, we noted that, respectively, 25% and 30% of Brassica and Arabidopsis duplicates are more similar to their intragenomic paralog than to their intergenomic ortholog, suggesting appreciable gene conversion since the divergence of these lineages (Supplementary Note). The sizes of the affected regions vary from 10 bp to >2 kb, with a majority of these apparent conversion events occurring in parallel in both species. Genes proximal to telo- meres tend to have lower nucleotide substitution rates than distal genes (P = 0.0004), which is likely to be a result of higher conversion rates in the former and is consistent with prior findings in grasses24,25. The gene dosage hypothesis26 predicts that gene functional categor- ies encoding products that interact with one another or in networks table 1 summary of the final assembly statistics Contig size Contig number Scaffold size Scaffold number N90 5,593 10,564 357,979 159 N80 10,984 7,292 773,703 104 N70 15,947 5,308 1,257,653 77 N60 21,229 3,874 1,452,355 56 N50 27,294 2,778 1,971,137 39 Total size 264,110,991 283,823,632 Total number (>100 bp) 60,521 40,549 Total number (>2 kb) 14,207 794 A01 A02 A03 A04 A05 A06 A07 A08 A09 A10 0M 10M 20M 30M Retrotransposons DNA transposons Genes (introns) Genes (exons) Figure 1 Chromosomal distribution of the main B. rapa genome features. Area charts quantify retrotransposons, genes (exons and introns) and DNA transposons. The x axis denotes the physical position along the B. rapa chromosomes in units of million (M) bases. C. papaya 19,093 13,533 A. thaliana 29,139 16,985 B. rapa 32,543 16,917 821 3,660 72 1,228 1,043 9,909 71 70 891 249 1,113 118 1,412 48 V. vinifera 22,608 13,810 1,003 Figure 2 Venn diagram showing unique and shared gene families between and among four sequenced dicotyledonous species (B. rapa, A. thaliana, C. papaya and V. vinifera). © 2 01 1 N at u re A m er ic a, In c. A ll ri g h ts r es er ve d . © 2 01 1 N at u re A m er ic a, In c. A ll ri g h ts r es er ve d . Nature GeNetics  VOLUME 43 | NUMBER 10 | OCTOBER 2011 1037 l e t t e r s should be over retained and genes with products that do not interact with other gene products should be under retained. In accordance with this hypothesis, we found B. rapa transcription factors with a detectable ortholog in A. thaliana to be significantly over retained (Supplementary Table 8 and Supplementary Note). We obtained similarly consistent results for genes encoding known protein subunits of cytoplasmic ribosomes and for genes known to be involved with the proteosome. We found under retention of genes encoding products with few interactions, specifically those associated with DNA repair, nuclease activity, binding and the chloroplast (Supplementary Table 9). The Gene Ontology annotation classes of over retained genes sug- gests that genome triplication may have expanded gene families that underlie environmental adaptability, as observed in other polyploid species27. Genes with Gene Ontology terms associated with response to important environmental factors, including salt, cold, osmotic stress, light, wounding, pathogen (broad spectrum) defense and both cadmium and zinc ions, were over retained (Fig. 5). Genes respond- ing to plant hormones (jasmonic acid, auxin, salicylic acid, ethylene, brassinosteroid, cytokinin and abscisic acid) were also over retained. Under selection, Brassica species have a remarkable propensity for the development of morphological variants28; we analyzed factors poten- tially involved in this development (Supplementary Note). One factor may be a general acceleration of nucleotide substitution rates. For 2,275 orthologous groups of genes in B. rapa, A. thaliana, papaya and grape (Supplementary Table 10), the nucleotide substitution rates in B. rapa were greater than in the other plants, with average Ks (Ks is the ratio of the number of synonymous substitutions per synonymous site) and Ka (Ka is the ratio of the number of non-synonymous substitutions per non- synonymous site) values 69% and 24%, respectively, greater than papaya and 1% and 7%, respectively, greater than A. thaliana (Supplementary Table 11). The much slower evolutionary rate in papaya may be explained by its longer generation time as a perennial. Another factor may be expan- sion of auxin-related gene families, as auxin controls many plant growth and morphological developmental processes29–31. We identified 347 B. rapa genes related to auxin synthesis, transportation, signal transduc- tion and inactivation, in contrast to 187 such genes present in A. thaliana (Supplementary Tables 12 and 13 and Supplementary Figs. 5–14). The TCP gene family is important in the evolution and specification of plant morphology32. This family has been amplified in B. rapa, which contains 39 TCP genes, which is more than A. thaliana (24), grape (19) or papaya (21) (Supplementary Fig. 15). The regulation of flowering is key to many Brassica morphotypes. Mesohexaploidy has had contrasting effects on the genes involved. FLC (FLOWERING LOCUS C)33 has three orthologs in B. rapa as a consequence of the WGT (Supplementary Fig. 16). Likewise, five of six B. rapa VRN1 (VERNALIZATION1) genes34 U C H R 5 C H R 4 C H R 3 C H R 2 C H R 1 90 M b 70 M b 50 M b 30 M b 10 M b 10 Mb A0 1 A0 2 A0 3 A0 4 A0 5 A0 6 A0 7 A0 8 A0 9 A1 0 30 Mb 50 Mb 70 Mb 90 Mb 110 Mb 130 Mb 150 Mb 170 Mb 190 Mb 210 Mb 230 Mb 250 Mb 11 0 M b TNMDW F S R W E VKL QX R W J P F U N I J J A B F C A BM XHQXBLKV H F B X N E C T B U B AOO TS OC I QXHH D BVKV B N I A A W RH R X W V S Q R U T P O N M L F J I H G K E D C B A Figure 3 Segmental collinearity of the genomes of B. rapa and A. thaliana. Conserved collinear blocks of gene models are shown between the ten chromosomes of the B. rapa genome (horizontal axis) and the five chromosomes of the A. thaliana genome (vertical axis). These blocks are labeled A to X and are color coded by inferred ancestral chromosome following established convention. 100 LF MF1 and MF2 80 60 P er ce nt o f o rt ho lo gs r et ai ne d 40 20 0 0 20 Chr1 Chr2 Chr3 Chr4 Chr5 40 60 A. thaliana chromosome (Mb) 80 100 120 Figure 4 The density of orthologous genes in three subgenomes (LF, MF1 and MF2) of B. rapa compared to A. thaliana. The x axis denotes the physical position of each A. thaliana gene locus. The y axis denotes the percentage of retained orthologous genes in B. rapa subgenomes around each A. thaliana gene, where 500 genes flanking each side of a certain gene locus were analyzed, giving a total window size of 1,001 genes. © 2 01 1 N at u re A m er ic a, In c. A ll ri g h ts r es er ve d . © 2 01 1 N at u re A m er ic a, In c. A ll ri g h ts r es er ve d . 1038  VOLUME 43 | NUMBER 10 | OCTOBER 2011 Nature GeNetics l e t t e r s produced by the WGT have been preserved (Supplementary Fig. 17). However, GI (GIGANTEA) genes35 have been limited to only one copy (Supplementary Fig. 18), as have the SVP (SHORT VEGETATIVE PHASE) genes36 (Supplementary Fig. 19) and each of the three COL (CONSTANS-LIKE) genes37 (Supplementary Fig. 20). The comparison of the genomes of B. rapa and A. thaliana, as for pre- vious comparisons of the cereals sorghum and rice38, sheds new light on the evolution of genome evolution in plants important for human nutri- tion. Our growing understanding of the processes shaping the triplicated genome of the mesopolyploid B. rapa is of relevance not only for closely related crops species, such as B. oleracea and B. napus, but also for other important crops with triplicated genomes, such as bread wheat. URLs. Brassica info, http://www.brassica.info/; GenoScope database, http://www.genoscope.cns.fr/externe/GenomeBrowser/Vitis/; Hawaii Papaya Genome Project, http://asgpb.mhpcc.hawaii.edu/papaya/; Arabidopsis Information Resource, http://www.arabidopsis.org/. MeThods Methods and any associated references are available in the online version of the paper at http://www.nature.com/naturegenetics/. Accession codes. This whole-genome shotgun project has been depos- ited at DDBJ/EMBL/GenBank under the accession AENI00000000. The version described in this paper is the first version, AENI01000000. Full annotation is available at http://brassicadb.org/. Note: Supplementary information is available on the Nature Genetics website. ACknowledGmenTS This work was primarily funded by the Chinese Ministry of Science and Technology, Ministry of Agriculture, Ministry of Finance, the National Natural Science Foundation of China. Other funding sources included: Core Research Budget of the Non-profit Governmental Research Institution; the European Union 7th Framework Project; funds from Shenzhen Municipal Government of China; the Danish Natural Science Research Council; National Academy of Agricultural Science and the Next-Generation Biogreen21 Program, Rural Development Administration, Korea; the Technology Development Program for Agriculture and Forestry, Ministry for Food, Agriculture, Forestry and Fisheries, Korea; United Kingdom’s Biotechnology and Biological Sciences Research Council; Institute National de la Recherche Agronomique, France; Japanese Kazusa DNA Research Institute Foundation; National Science Foundation, USA; Bielefeld University, Germany; the Australian Research Council; the Australian Grains Research and Development Corporation; Agriculture and Agri-Food Canada; and the National Research Council of Canada’s Plant Biotechnology Institute. See the Supplementary Note for a full list of support and acknowledgments. AUTHoR ConTRIBUTIonS Principal investigators: Xiaowu Wang, J. Wu, S.L., Y.B., J.-H.M. and I.B. DNA and transcriptome sequencing: Bo Wang (group leader), Xiaowu Wang (group leader), B.C. (group leader), Jun Wang (BGI), K.W., J. Wu, S.L., W.H., B.-S.P., I.B., D.E., I.A.P.P., J.-H.M., H.A., Bernd Weisshaar, Shusei Sato, H.H., S.T., A.G.S., Y. Lim,
本文档为【The genome of the mesopolyploid crop species Brassica rapa】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_584954
暂无简介~
格式:pdf
大小:864KB
软件:PDF阅读器
页数:6
分类:
上传时间:2011-09-29
浏览量:20