©
2
01
1
N
at
u
re
A
m
er
ic
a,
In
c.
A
ll
ri
g
h
ts
r
es
er
ve
d
.
©
2
01
1
N
at
u
re
A
m
er
ic
a,
In
c.
A
ll
ri
g
h
ts
r
es
er
ve
d
.
Nature GeNetics VOLUME 43 | NUMBER 10 | OCTOBER 2011 1035
l e t t e r s
Brassica nigra (B genome) and B. oleracea (C genome) having formed
the amphidiploid species B. juncea (A and B genomes), B. napus
(A and C genomes) and B. carinata (B and C genomes) by hybridiza-
tion. Comparative physical mapping studies have confirmed genome
triplication in a common ancestor of B. oleracea11 and B. rapa12 since
its divergence from the A. thaliana lineage at least 13–17 MYA6,7,13.
Using 72× coverage of paired short read sequences generated by
Illumina GA II technology and stringent assembly parameters, we
assembled the genome of the B. rapa ssp. pekinensis line Chiifu-401-42
and analyzed the assembly (Online Methods and Supplementary Note).
The final assembly statistics are summarized in Table 1. The assembled
sequence of 283.8 Mb was estimated to cover >98% of the gene space
(Supplementary Table 1) and is greater than the previous estimated
size of the euchromatic space, 220 Mb14. The assembly showed excellent
agreement with the previously reported chromosome A03 (ref. 15) and
with 647 bacterial artificial chromosomes (BACs)14 (Online Methods)
sequenced by Sanger technology. Integration with 199,452 BAC-end
sequences produced 159 super scaffolds representing 90% of the assem-
bled sequences, with an N50 scaffold (N50 scaffold is a weighted median
statistic indicating that 50% of the entire assembly is contained in scaf-
folds equal to or larger than this value) size of 1.97 Mb. Genetic mapping
of 1,427 markers in B. rapa allowed us to produce ten pseudo chromo-
somes that included 90% of the assembly (Supplementary Table 2).
We found the difference in the physical sizes of the A. thaliana
and B. rapa genomes to be largely because of transposable elements
(Supplementary Table 3). Although widely dispersed throughout the
genome, as shown in Figure 1, the transposon-related sequences were
most abundant in the vicinity of the centromeres. We estimated that
transposon-related sequences occupy 39.5% of the genome, with the
proportions of retrotransposons (with long terminal repeats), DNA
transposons and long interspersed elements being 27.1%, 3.2% and
2.8%, respectively (Supplementary Tables 4 and 5).
We modeled and analyzed protein coding genes (described in the
Online Methods and the Supplementary Note). We identified 41,174
protein coding genes, distributed as shown in Figure 1. The gene models
have an average transcript length of 2,015 bp, a coding length of 1,172
bp and a mean of 5.03 exons per gene, both similar to that observed in
A. thaliana16. A total of 95.8% of gene models have a match in at least
one of the public protein databases and 99.3% are represented among
the public EST collections or de novo Illumina mRNA-Seq data. Among
the total 16,917 B. rapa gene families, only 1,003 (5.9%) appear to be
lineage specific, with 15,725 (93.0%) shared with A. thaliana16 and 9,909
(58.6%) also shared by Carica papaya17 and Vitis vinifera18 (Fig. 2).
The genome of the mesopolyploid crop species Brassica rapa
The Brassica rapa Genome Sequencing Project Consortium
We report the annotation and analysis of the draft genome
sequence of Brassica rapa accession Chiifu-401-42, a Chinese
cabbage. We modeled 41,174 protein coding genes in the
B. rapa genome, which has undergone genome triplication.
We used Arabidopsis thaliana as an outgroup for investigating
the consequences of genome triplication, such as structural
and functional evolution. The extent of gene loss (fractionation)
among triplicated genome segments varies, with one of the
three copies consistently retaining a disproportionately large
fraction of the genes expected to have been present in its
ancestor. Variation in the number of members of gene families
present in the genome may contribute to the remarkable
morphological plasticity of Brassica species. The B. rapa
genome sequence provides an important resource for studying
the evolution of polyploid genomes and underpins the genetic
improvement of Brassica oil and vegetable crops.
Model species have provided valuable insights into angiosperm
(flowering plant) genome structure, function and evolution. For example,
A. thaliana has experienced two genome duplications since its divergence
from Carica, with rapid DNA sequence divergence, extensive gene loss
and fractionation of ancestral gene order eroding the resemblance of
A. thaliana to ancestral Brassicales1. Compared with an ancestor at just
a few million years ago, A. thaliana has undergone a ~30% reduction in
genome size2 and 9–10 chromosomal rearrangements3,4 that differentiate
it from its sister species Arabidopsis lyrata. Whole-genome duplication
has been observed in all plant genomes sequenced to date. A. thaliana has
undergone three paleo-polyploidy events5: a paleohexaploidy (γ) event
shared with most dicots (asterids and rosids) and two paleotetraploidy
events (β then α) shared with other members of the order Brassicales.
B. rapa shares this complex history but with the addition of a whole-
genome triplication (WGT) thought to have occurred between 13 and
17 million years ago (MYA)6,7, making ‘mesohexaploidy’ a characteristic
of the Brassiceae tribe of the Brassicaceae8.
Brassica crops are used for human nutrition and provide opportuni-
ties for the study of genome evolution. These crops include important
vegetables (B. rapa (Chinese cabbage, pak choi and turnip) and Brassica
oleracea (broccoli, cabbage and cauliflower)) as well as oilseed crops
(Brassica napus, B. rapa, Brassica juncea and Brassica carinata), which
provide collectively 12% of the world’s edible vegetable oil production9.
The six widely cultivated Brassica species are also a classical example
of the importance of polyploidy in botanical evolution, described by
‘U’s triangle’10, with the three diploid species B. rapa (A genome),
A full list of members appears at the end of the paper.
Received 7 March 2011; accepted 3 August 2011; published online 28 August 2011; doi:10.1038/ng.919
©
2
01
1
N
at
u
re
A
m
er
ic
a,
In
c.
A
ll
ri
g
h
ts
r
es
er
ve
d
.
©
2
01
1
N
at
u
re
A
m
er
ic
a,
In
c.
A
ll
ri
g
h
ts
r
es
er
ve
d
.
1036 VOLUME 43 | NUMBER 10 | OCTOBER 2011 Nature GeNetics
l e t t e r s
We analyzed the organization and evolution of the genome (as
described in the Online Methods and the Supplementary Note).
B. rapa’s close relationship to A. thaliana allows Arabidopsis to be used
as an outgroup for investigating the adaptation of the Brassica lineage
to the triplicated state. In total, 108.6 Mb (90.01%) of the A. thaliana
genome and 259.6 Mb (91.13%) of the B. rapa genome assembly were
contained within collinear blocks. We confirmed the almost complete
triplication of the B. rapa genome relative to A. thaliana (Fig. 3) and (by
inference) to the postulated Brassicaceae ancestral genome (n = 8). The
gene paralogues anchored in the triplicated segments (Supplementary
Fig. 1) and their orthologs (Supplementary Table 6) dated the meso-
hexaploidy event to between 5 and 9 MYA (Supplementary Fig. 2),
which is more recent than has been reported previously13.
The Brassica mesohexaploidy offers an opportunity to study gene
retention in triplicated genomes. Assuming an initial count of protein
coding genes similar to that of A. thaliana (around 30,000), the newly
formed hexaploid would have about 90,000 genes, of which we can now
identify only 41,174. This is typical of the substantial gene loss that occurs
following polyploid formation in eukaryotes19–21. We identified each of
the orthologous blocks in the B. rapa genome corresponding to ancestral
blocks using collinearity between orthologs on the genomes of B. rapa
and A. thaliana and found significant disparity in gene loss across the
triplicated blocks (Supplementary Fig. 3). Of the 21 regions of conserved
synteny, 20 showed significant deviations from equivalent gene frequen-
cies (P < 0.05) (Supplementary Fig. 4). To illustrate this variation, we
concatenated the least fractionated blocks (LF), the medium fractionated
blocks (MF1) and the most fractionated blocks (MF2) and calculated
the proportions of genes retained in each of these sub-genomes relative
to A. thaliana. The LF sub-genome retains 70% of the genes found in
A. thaliana, whereas the MF1 and MF2 sub-genomes retain substantially
lower proportions of retained genes (46% and 36%, respectively; Fig. 4).
Based on the analysis of synonymous base substitution rates (Ks values),
the pairwise divergences between the three sub-genomes are indistin-
guishable from each other (Supplementary Table 7). Our observation of
differentially fractionated sub-genomes is consistent with the hypothesis
that the sub-genomes MF1 and MF2 underwent substantial fractiona-
tion in a tetraploid nucleus before fractionation commenced in the LF
genome in a more recently formed hexaploid. However, biased fractiona-
tion following tetraploidy (albeit less extreme than we observed) has
been reported in A. thaliana22 and maize23, where it was hypothesized
to be the result of differential epigenetic marking of the parent genomes
(resulting in differential gene silencing and consequential fraction), rep-
resenting an alternative hypothesis.
The retention of extensive collinear genome blocks provides a
potential opportunity for ectopic DNA recombination. By finding and
comparing homologous gene quartets, including two α or β duplicates
in Brassica and their respective orthologs in Arabidopsis, we noted that,
respectively, 25% and 30% of Brassica and Arabidopsis duplicates are
more similar to their intragenomic paralog than to their intergenomic
ortholog, suggesting appreciable gene conversion since the divergence of
these lineages (Supplementary Note). The sizes of the affected regions
vary from 10 bp to >2 kb, with a majority of these apparent conversion
events occurring in parallel in both species. Genes proximal to telo-
meres tend to have lower nucleotide substitution rates than distal genes
(P = 0.0004), which is likely to be a result of higher conversion rates in
the former and is consistent with prior findings in grasses24,25.
The gene dosage hypothesis26 predicts that gene functional categor-
ies encoding products that interact with one another or in networks
table 1 summary of the final assembly statistics
Contig size Contig number Scaffold size Scaffold number
N90 5,593 10,564 357,979 159
N80 10,984 7,292 773,703 104
N70 15,947 5,308 1,257,653 77
N60 21,229 3,874 1,452,355 56
N50 27,294 2,778 1,971,137 39
Total size 264,110,991 283,823,632
Total number
(>100 bp)
60,521 40,549
Total number
(>2 kb)
14,207 794
A01
A02
A03
A04
A05
A06
A07
A08
A09
A10
0M 10M 20M 30M
Retrotransposons
DNA transposons
Genes (introns)
Genes (exons)
Figure 1 Chromosomal distribution of the main B. rapa genome features.
Area charts quantify retrotransposons, genes (exons and introns) and DNA
transposons. The x axis denotes the physical position along the B. rapa
chromosomes in units of million (M) bases.
C. papaya
19,093
13,533
A. thaliana
29,139
16,985
B. rapa
32,543
16,917
821
3,660
72
1,228
1,043
9,909
71
70
891
249
1,113
118
1,412
48
V. vinifera
22,608
13,810
1,003
Figure 2 Venn diagram showing unique and shared gene families between
and among four sequenced dicotyledonous species (B. rapa, A. thaliana,
C. papaya and V. vinifera).
©
2
01
1
N
at
u
re
A
m
er
ic
a,
In
c.
A
ll
ri
g
h
ts
r
es
er
ve
d
.
©
2
01
1
N
at
u
re
A
m
er
ic
a,
In
c.
A
ll
ri
g
h
ts
r
es
er
ve
d
.
Nature GeNetics VOLUME 43 | NUMBER 10 | OCTOBER 2011 1037
l e t t e r s
should be over retained and genes with products that do not interact
with other gene products should be under retained. In accordance
with this hypothesis, we found B. rapa transcription factors with a
detectable ortholog in A. thaliana to be significantly over retained
(Supplementary Table 8 and Supplementary Note). We obtained
similarly consistent results for genes encoding known protein subunits
of cytoplasmic ribosomes and for genes known to be involved with the
proteosome. We found under retention of genes encoding products
with few interactions, specifically those associated with DNA repair,
nuclease activity, binding and the chloroplast (Supplementary Table 9).
The Gene Ontology annotation classes of over retained genes sug-
gests that genome triplication may have expanded gene families that
underlie environmental adaptability, as observed in other polyploid
species27. Genes with Gene Ontology terms associated with response
to important environmental factors, including salt, cold, osmotic
stress, light, wounding, pathogen (broad spectrum) defense and both
cadmium and zinc ions, were over retained (Fig. 5). Genes respond-
ing to plant hormones (jasmonic acid, auxin, salicylic acid, ethylene,
brassinosteroid, cytokinin and abscisic acid) were also over retained.
Under selection, Brassica species have a remarkable propensity for the
development of morphological variants28; we analyzed factors poten-
tially involved in this development (Supplementary Note). One factor
may be a general acceleration of nucleotide substitution rates. For 2,275
orthologous groups of genes in B. rapa, A. thaliana, papaya and grape
(Supplementary Table 10), the nucleotide substitution rates in B. rapa
were greater than in the other plants, with average Ks (Ks is the ratio
of the number of synonymous substitutions per synonymous site) and
Ka (Ka is the ratio of the number of non-synonymous substitutions per non-
synonymous site) values 69% and 24%, respectively, greater than papaya
and 1% and 7%, respectively, greater than A. thaliana (Supplementary
Table 11). The much slower evolutionary rate in papaya may be explained
by its longer generation time as a perennial. Another factor may be expan-
sion of auxin-related gene families, as auxin controls many plant growth
and morphological developmental processes29–31. We identified 347
B. rapa genes related to auxin synthesis, transportation, signal transduc-
tion and inactivation, in contrast to 187 such genes present in A. thaliana
(Supplementary Tables 12 and 13 and Supplementary Figs. 5–14). The
TCP gene family is important in the evolution and specification of plant
morphology32. This family has been amplified in B. rapa, which contains
39 TCP genes, which is more than A. thaliana (24), grape (19) or papaya (21)
(Supplementary Fig. 15). The regulation of flowering is key to many
Brassica morphotypes. Mesohexaploidy has had contrasting effects on
the genes involved. FLC (FLOWERING LOCUS C)33 has three orthologs
in B. rapa as a consequence of the WGT (Supplementary Fig. 16).
Likewise, five of six B. rapa VRN1 (VERNALIZATION1) genes34
U
C
H
R
5
C
H
R
4
C
H
R
3
C
H
R
2
C
H
R
1
90
M
b
70
M
b
50
M
b
30
M
b
10
M
b
10 Mb
A0
1
A0
2
A0
3
A0
4
A0
5
A0
6
A0
7
A0
8
A0
9
A1
0
30 Mb 50 Mb 70 Mb 90 Mb 110 Mb 130 Mb 150 Mb 170 Mb 190 Mb 210 Mb 230 Mb 250 Mb
11
0
M
b
TNMDW F S R W E VKL QX R W J P F U N I J J A B F C A BM XHQXBLKV H F B X N E C T B U B AOO TS OC I QXHH D BVKV B N I A A W RH R
X
W
V
S
Q
R
U
T
P
O
N
M
L
F
J
I
H
G
K
E
D
C
B
A
Figure 3 Segmental collinearity of the genomes of B. rapa and A. thaliana. Conserved collinear blocks of gene models are shown between the ten
chromosomes of the B. rapa genome (horizontal axis) and the five chromosomes of the A. thaliana genome (vertical axis). These blocks are labeled A to
X and are color coded by inferred ancestral chromosome following established convention.
100 LF
MF1 and MF2
80
60
P
er
ce
nt
o
f o
rt
ho
lo
gs
r
et
ai
ne
d
40
20
0
0 20
Chr1 Chr2 Chr3 Chr4 Chr5
40 60
A. thaliana chromosome (Mb)
80 100 120
Figure 4 The density of orthologous genes in three subgenomes (LF, MF1
and MF2) of B. rapa compared to A. thaliana. The x axis denotes the
physical position of each A. thaliana gene locus. The y axis denotes the
percentage of retained orthologous genes in B. rapa subgenomes around
each A. thaliana gene, where 500 genes flanking each side of a certain
gene locus were analyzed, giving a total window size of 1,001 genes.
©
2
01
1
N
at
u
re
A
m
er
ic
a,
In
c.
A
ll
ri
g
h
ts
r
es
er
ve
d
.
©
2
01
1
N
at
u
re
A
m
er
ic
a,
In
c.
A
ll
ri
g
h
ts
r
es
er
ve
d
.
1038 VOLUME 43 | NUMBER 10 | OCTOBER 2011 Nature GeNetics
l e t t e r s
produced by the WGT have been preserved (Supplementary Fig. 17).
However, GI (GIGANTEA) genes35 have been limited to only one copy
(Supplementary Fig. 18), as have the SVP (SHORT VEGETATIVE
PHASE) genes36 (Supplementary Fig. 19) and each of the three COL
(CONSTANS-LIKE) genes37 (Supplementary Fig. 20).
The comparison of the genomes of B. rapa and A. thaliana, as for pre-
vious comparisons of the cereals sorghum and rice38, sheds new light on
the evolution of genome evolution in plants important for human nutri-
tion. Our growing understanding of the processes shaping the triplicated
genome of the mesopolyploid B. rapa is of relevance not only for closely
related crops species, such as B. oleracea and B. napus, but also for other
important crops with triplicated genomes, such as bread wheat.
URLs. Brassica info, http://www.brassica.info/; GenoScope database,
http://www.genoscope.cns.fr/externe/GenomeBrowser/Vitis/; Hawaii
Papaya Genome Project, http://asgpb.mhpcc.hawaii.edu/papaya/;
Arabidopsis Information Resource, http://www.arabidopsis.org/.
MeThods
Methods and any associated references are available in the online
version of the paper at http://www.nature.com/naturegenetics/.
Accession codes. This whole-genome shotgun project has been depos-
ited at DDBJ/EMBL/GenBank under the accession AENI00000000. The
version described in this paper is the first version, AENI01000000. Full
annotation is available at http://brassicadb.org/.
Note: Supplementary information is available on the Nature Genetics website.
ACknowledGmenTS
This work was primarily funded by the Chinese Ministry of Science and
Technology, Ministry of Agriculture, Ministry of Finance, the National Natural
Science Foundation of China. Other funding sources included: Core Research
Budget of the Non-profit Governmental Research Institution; the European Union
7th Framework Project; funds from Shenzhen Municipal Government of China;
the Danish Natural Science Research Council; National Academy of Agricultural
Science and the Next-Generation Biogreen21 Program, Rural Development
Administration, Korea; the Technology Development Program for Agriculture and
Forestry, Ministry for Food, Agriculture, Forestry and Fisheries, Korea; United
Kingdom’s Biotechnology and Biological Sciences Research Council; Institute
National de la Recherche Agronomique, France; Japanese Kazusa DNA Research
Institute Foundation; National Science Foundation, USA; Bielefeld University,
Germany; the Australian Research Council; the Australian Grains Research
and Development Corporation; Agriculture and Agri-Food Canada; and the
National Research Council of Canada’s Plant Biotechnology Institute. See the
Supplementary Note for a full list of support and acknowledgments.
AUTHoR ConTRIBUTIonS
Principal investigators: Xiaowu Wang, J. Wu, S.L., Y.B., J.-H.M. and I.B.
DNA and transcriptome sequencing: Bo Wang (group leader), Xiaowu Wang
(group leader), B.C. (group leader), Jun Wang (BGI), K.W., J. Wu, S.L., W.H.,
B.-S.P., I.B., D.E., I.A.P.P., J.-H.M., H.A., Bernd Weisshaar, Shusei Sato, H.H., S.T.,
A.G.S., Y. Lim,
本文档为【The genome of the mesopolyploid crop species Brassica rapa】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。