©
20
12
N
at
ur
e
A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.
Nature GeNetics ADVANCE ONLINE PUBLICATION �
l e t t e r s
Knowledge of the rate and pattern of new mutation is critical
to the understanding of human disease and evolution. We used
extensive autozygosity in a genealogically well-defined
population of Hutterites to estimate the human sequence
mutation rate over multiple generations. We sequenced
whole genomes from 5 parent-offspring trios and identified
44 segments of autozygosity. Using the number of meioses
separating each pair of autozygous alleles and the 72 validated
heterozygous single-nucleotide variants (SNVs) from 5�2 Mb of
autozygous DNA, we obtained an SNV mutation rate of �.20 ×
�0−8 (95% confidence interval 0.89–�.43 × �0−8) mutations
per base pair per generation. The mutation rate for bases within
CpG dinucleotides (9.72 × �0−8) was 9.5-fold that of non-CpG
bases, and there was strong evidence (P = 2.67 × �0−4) for a
paternal bias in the origin of new mutations (85% paternal).
We observed a non-uniform distribution of heterozygous SNVs
(both newly identified and known) in the autozygous segments
(P = 0.00�), which is suggestive of mutational hotspots or sites
of long-range gene conversion.
Various approaches have provided a wide range of SNV mutation rate
estimates (1–3 × 10−8 mutations per base pair per generation). Early
studies of mutation rates in humans focused on specific loci or the
de novo incidence of disease1–4. More recent studies have leveraged
whole-genome sequencing data on a total of three nuclear families to
estimate de novo mutation rates for SNVs of approximately 1 × 10−8
mutations per base pair per generation5,6. Comparative studies of
chimpanzee and human genomes provided higher estimates (for
instance, 2.5 × 10−8) but are highly contingent on uncertainty about
the number of generations since human-chimpanzee divergence7.
In contrast to studies that are focused on identifying new muta-
tions arising in a single generation, the examination of populations
with a small number of founding individuals is ideal for estimating
mutation rates across a small number of generations. The Hutterites
are a population of Anabaptist farmers living on the plains of the
United States and Canada who are descended from a small group of
founders (<90 individuals). The genealogy of this group is completely
known, and genome-wide SNP genotype data have been collected
from over 1,400 individuals who are related to each other in a
13-generation pedigree descended from 64 founders8,9. Due to
increased levels of consanguinity, Hutterite individuals carry large
segments of the genome that are autozygous or homozygous by recent
decent10. The alleles in an autozygous segment are descended from
a recent common ancestor and have accumulated mutations in the
generations since transmission from this individual.
Estimating the human mutation rate using autozygosity
in a founder population
Catarina D Campbell1, Jessica X Chong2, Maika Malig1, Arthur Ko1, Beth L Dumont1, Lide Han2, Laura Vives1,
Brian J O’Roak1, Peter H Sudmant1, Jay Shendure1, Mark Abney2, Carole Ober2,3 & Evan E Eichler1,4
1Department of Genome Sciences, University of Washington, Seattle, Washington, USA. 2Department of Human Genetics, The University of Chicago, Chicago, Illinois,
USA. 3Department of Obstetrics and Gynecology, The University of Chicago, Chicago, Illinois, USA. 4Howard Hughes Medical Institute, Seattle, Washington, USA.
Correspondence should be addressed to E.E.E. (eee@gs.washington.edu).
Received 6 June; accepted 30 August; published online 23 September 2012; doi:10.1038/ng.2418
3 5
Generations
2
4
1
3
8
5
Figure 1 Relationship of sequenced individuals. Simplified pedigree
showing the relationship between the 15 sequenced individuals. Black
symbols represent the children in the five trios, and gray symbols
represent their parents. Founders are connected by blue lines, with
the shade of blue indicating the number of generations separating the
connected individuals. For clarity, only the shortest relationships between
each individual and the parents of that individual are shown. The color
scale represents the number of generations separating the individuals,
where darker blue indicates fewer generations and lighter blue indicates
more generations.
©
20
12
N
at
ur
e
A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.
2 ADVANCE ONLINE PUBLICATION Nature GeNetics
l e t t e r s
We selected five Hutterite parent-offspring trios for whole-genome
sequencing, with the parents in each trio being related to each other
by 6–8 (mean of 6.6) meiotic transmissions (Fig. 1). We performed
whole-genome sequencing of DNA isolated from whole blood using
Illumina paired-end sequencing, generating 775 Gb of sequence
with an average of 13-fold coverage per individual (Supplementary
Table 1). The sequencing reads for each sample were aligned to the
human reference genome (NCBI Build 36). We identified a total
of 5.4 million SNVs on the basis of the intersection of variant calls
from 2 different algorithms11,12 (Supplementary Table 2). The SNP
genotypes from whole-genome sequencing were highly concordant
to those generated by SNP microarray (mean genotype concordance
of 99.7%) (Supplementary Table 2).
We identified extended regions of homozygosity in the offspring
of the five trios and in five previously sequenced genomes (three
European-Americans and two Yoruba)13 (Online Methods). The
extent of homozygosity was correlated to the inbreeding coefficients
of the Hutterite individuals (Supplementary Fig. 1, Supplementary
Table 3 and Supplementary Note). As expected, the five Hutterite
probands showed significantly greater autozygosity (223 Mb on
average per individual) than other European-American individuals
(95 Mb) or the Yoruba individuals (4 Mb) (Fig. 2 and Supplementary
Table 3). Although the amount of short homozygous segments was
0.5 1.0 2.0 5.0 10.0 20.0 50.0
Size of autozyogus segments (Mb)
C
ou
nt
Hutterite (n = 5)
European-American (n = 3)
YRI (n = 2)
50
20
5
Total Mb in bin
0
20
40
60
80
100Figure 2 Elevated autozygosity in the Hutterite individuals. Autozygous
segments were binned by size for the five Hutterite individuals, three
European-American individuals and two Yoruba individuals (YRI).
The x axis represents bins of autozygous segments of different size,
and the y axis shows the number of segments in each bin. In each bin,
individuals are represented by ‘bubbles’, with the size of the bubble
denoting the total amount of genomic sequence in that bin.
Chr. 2
Individual 1
Individual 2
Individual 3
Individual 4
Individual 5
JF
NA12891
NA12892
NA19238
NA19239
20 Mb
180,000,000 190,000,000 200,000,000 210,000,000 220,000,000 230,000,000
Homozygous
Heterozygous
Carrier of autozygous segment
SNP microarray data available
MRCAs
(8 generations)
Ancestor of individual 3
Path of maternal inheritance
Path of paternal inheritance
Same individual
Other common ancestors
(mean generations = 15.5)
a
b
Figure 3 Determination of the MRCA for an autozygous segment. (a) A 54-Mb autozygous segment on chromosome 2 in individual 3. Genomic
coordinates (hg18) are given on the horizontal axis, and each individual is represented on the vertical axis, including the five Hutterite individuals,
the three European-Americans and the two Yoruba. Each SNV is represented by a vertical bar that is colored blue if the variant is homozygous and
green if it is heterozygous. The autozygous segment in individual 3 is boxed in orange. (b) Determination of the MRCA for the autozygous segment in
individual 3. The pedigree containing all the haplotype carriers of the autozygous haplotype is shown. Individual 3 is shown in yellow. Haplotype carriers
have two MRCAs (boxed) as well as additional common ancestors further up the pedigree. The paths from these individuals to the autozygous subject
are shown in red for the maternal ancestors and blue for the paternal ancestors; all ancestors of the individual are marked with a star. Black dashed
lines represent relationships to common ancestors further back in the pedigree.
©
20
12
N
at
ur
e
A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.
Nature GeNetics ADVANCE ONLINE PUBLICATION 3
l e t t e r s
similar in the Hutterite individuals and the other European-American
individuals, we observed 33-fold more autozygous base pairs in
segments of greater than 2 Mb in length in the Hutterite individuals
(Fig. 2 and Supplementary Note). We further refined and validated
segments that were longer than 5 Mb by comparing to autozygous
segments identified in SNP microarray data for the same samples10,14
to obtain a final list of 44 regions of autozygosity (6–12 segments
per individual; 5–54 Mb in length). We restricted subsequent analy-
ses to these 512 Mb of autozygous DNA (Supplementary Fig. 2 and
Supplementary Table 4).
We determined the number of meioses separating each allele within
each autozygous segment. The small founding population and complex
genealogy of the Hutterite population (Fig. 1) made this potentially
problematic because of the large number of shared common ances-
tors and multiple paths of descent between any ancestor-descendant
pair. To resolve the ancestry of the autozygous segments, we com-
bined the pedigree structure and genome-wide SNP genotype data9 to
identify the most recent common ancestors (MRCAs) on the basis of
segregation within the Hutterite genealogy8 (Fig. 3, Supplementary
Fig. 3 and Supplementary Note). Using the identified MRCAs, we
estimated that the 2 haplotypes of the 44 autozygous segments were
separated by 8–18 meioses (Supplementary Table 4).
To calculate the SNV mutation rate, we identified heterozygous
SNVs within each autozygous segment, excluding regions of common
repeats, segmental duplication and known SNPs (found in dbSNP132).
We validated 72 SNVs as heterozygous by Sanger-based capillary
sequencing (Table 1, Supplementary Table 5 and Supplementary
Note). We calculated an SNV mutation rate (µ) of 1.20 × 10−8 (95%
confidence interval (CI) = 0.89–1.43 × 10−8) mutations per base pair
per generation. We observed consistent µ values across the five trios,
with values ranging between 0.92 × 10−8 and 1.51 × 10−8 (Fig. 4 and
Table 1). Among these mutations, we observed an excess of transi-
tions relative to transversions, resulting in a Ti/Tv ratio of 1.64 that
was not significantly different from the genome-wide SNV ratio of
2.17 (two-tailed χ-squared P = 0.27). Twelve of the 72 validated het-
erozygous SNVs (16.7%) mapped to CpG dinucleotides. We calculated
a µ value for CpG sites of 9.72 × 10−8 mutations per CpG base pair
per generation, which is 9.5× greater than the µ value for non-CpG
bases (1.02 × 10−8). We also estimated the mutation rate on the basis of
de novo mutations in the most recent generation (Supplementary
Table 6 and Supplementary Note). Using 176 validated de novo
SNVs, we calculated a mutation rate of 0.96 × 10−8 (95% CI = 0.82–
1.09 × 10−8) mutations per base pair per generation; although this rate
is lower than the one calculated using autozygosity, the confidence
intervals of these rates overlap (Fig. 4).
We identified and validated one potential gene conversion event
involving paralogs of segmental duplications containing the genes
C4A and C4B in a region where lower copy number has been
associated with lupus15. Although individual 4 had a total diploid
copy number of six for this CNV, we
determined that the sequence content of
the two alleles differed (Supplementary
Fig. 4), likely as a result of gene conversion
between paralogous copies of, at a minimum,
the TNXA and TNXB genes (6 kb).
Both theoretical and experimental analyses
have predicted that the male germline contrib-
utes disproportionately to de novo mutations
compared to the female germline7,16,17.
However, a recent analysis on two parent-
offspring trios reported a paternal bias in
mutation in one trio and a maternal bias in the other5. Given the
complexity of the Hutterite pedigree and transmissions through
multiple female and male ancestors, we focused on the putative
genome-wide de novo mutations in the most recent generation. We
used molecular phasing5,12,17 to determine the parental origin of 26
of the 176 validated de novo SNVs and found that 84.6% (22 of 26;
95% CI = 70.8–98.5%) of de novo SNPs originated on the paternal
haplotype, confirming a male bias for new SNVs (two-tailed binomi-
nal P = 2.67 × 10−4).
One advantage of using autozygosity in the identification of recent
mutations is the ability to identify potential gene conversion events
between homologous chromosomes. Such events could lead to clus-
ters of heterozygous SNVs (including known SNPs) within regions of
autozygosity, and we identified four clusters (with two or more SNVs
mapping within 10 kb of each other) (Table 2). One of these clusters
is 309 kb in length, suggesting that it most likely arose as a product of
crossover events18. Excluding this large cluster, the average distance
between heterozygous SNPs in the remaining three clusters was
2,723 bp (range of 7–7,839 bp). We tested this distribution by simu-
lation (n = 10,000 replicates) and determined that there was a signi-
ficant excess of ‘clustered’ SNVs compared to that expected with a
random distribution of variants (empirical P = 0.001).
We also tested whether the de novo SNVs in the most recent
generation were uniformly distributed in the genome. Notably,
we observed three clusters of validated de novo variants (Table 2
and Supplementary Table 5) and a significant excess (empirical
P = 6 × 10−6) of de novo SNVs in close proximity (<10 kb) using
simulations (n = 1,000,000 replicates).
table 1 sNV mutation rates determined from segments of autozygosity
Individual
Segments
(>5 Mb)
Total callable
(Mb)a
Mean meioses
(MRCA)b SNVsc SNV µ 95% CId
1 7 63.4 13.8 13 1.51 × 10−8 0.62–2.28 × 10−8
2 6 55.9 13.8 7 0.92 × 10−8 0.17–1.72 × 10−8
3 9 124.8 9.9 13 1.07 × 10−8 0.45–1.63 × 10−8
4 10 147.6 12.0 19 1.09 × 10−8 0.56–1.55 × 10−8
5 12 120.8 12.0 20 1.40 × 10−8 0.73–1.96 × 10−8
All 44 512.4 11.9 72 1.20 × 10−8 0.89–1.43 × 10−8
aNon-segmental duplication, non-simple repeat and non-dbSNP132 variants with at least six mapped reads. bWeighted by
length of segment. cValidated as newly identified, heterozygous variants. dBased on a Poisson distribution.
0
0.5
1.0
1.5
2.0
2.5
1 2 3 4 5
Individual
All
S
N
V
�
(
×
1
0–
8 )
Autozygosity
Last generation
Figure 4 SNV mutation rate estimates. The SNV mutation rate point
estimates are shown for each individual and all five individuals combined,
with the error bars representing the 95% CIs that were generated on the
basis of a Poisson distribution. SNP µ is the number of SNV mutations
per base pair per generation. Filled diamonds represent estimates from
autozygous segments, and open diamonds represent estimates from SNVs
identified in the most recent generation.
©
20
12
N
at
ur
e
A
m
er
ic
a,
In
c.
A
ll
rig
ht
s
re
se
rv
ed
.
4 ADVANCE ONLINE PUBLICATION Nature GeNetics
l e t t e r s
There has recently been much interest in using massively parallel
sequencing data to obtain an accurate estimate of the mutation rate
using nuclear families5,6. We developed an approach using extended
regions of autozygosity to discover new mutations that have emerged
within a few generations. Compared to analyses focused on de novo
mutations in a single generation, our approach significantly reduces
the number of false positives and somatic mutations, as most
mutations in autozygous segments are transmitted from one of the
parents. In addition, given the relationship between paternal age and
the number of de novo mutations17,19, our approach reduces this
confounding effect by yielding an average mutation rate over 8–18
meioses. Disadvantages include uncertainty about the ancestry of the
autozygous segments (Supplementary Note), the smaller genomic
‘search space’ (512 Mb), the potential to confound new mutation and
gene conversion events and increased potential for purifying selection
to eliminate a small fraction of new mutations, although the frac-
tion of such events should be negligible20. We have tried to reduce
the confounding effect of gene conversion by limiting our analysis
to newly identified SNVs. We estimated an SNV mutation rate of
1.20 × 10−8 mutations per base pair per generation using autozygous
segments, which is higher than the rate of 0.96 × 10−8 that we esti-
mated for the most recent generation and the rate of 1.1 × 10−8 that
was previously published for the whole genome5,6 yet lower than the
rate estimated in a recent resequencing study21.
While this manuscript was under review, two additional stud-
ies characterizing the human mutation rate were published. First, a
sequence mutation rate of 1.2 × 10−8 was calculated from an analysis
of whole-genome sequencing of over 70 trios19, which is equal to
the rate obtained in our analysis, suggesting the accuracy of using
autozygosity to estimate the mutation rate. Notably, the reported
quantification of the correlation between the number of mutations
and paternal age19 suggests that the relatively young age of the fathers
of the trios analyzed here (21–30 years at the time of childbirth) may
provide an explanation for the lower mutation rate we observed in the
most recent generation. In a second publication, an inferred sequence
mutation rate of 1.82 × 10−8 was calculated by modeling population
genetic parameters on the basis of the mutational properties of micro-
satellites22; the differences between this estimate and our estimates
are likely due to differences in methodology.
We observed a non-uniform distribution of a small fraction of
mutations within autozygous segments that seem to provide evi-
dence of recent allelic gene conversion. We observed three clusters
of variants that were unlikely to have been generated by crossover
mechanisms and might represent potential allelic gene conversion
events in the autozygous segments, although one cluster (11 kb) was
larger than expected for typical gene conversion events23. Only one
of the ten SNVs in these clusters was in a CpG dinucleotide, and the
GC content (0.36–0.50) of these three regions
was not consistent with a model of recur-
rent mutation due to CpG methylation and
demethylation. The average distance between
heterozygous SNPs in these clusters was
2,723 bp, ruling out compound mutation24
as a likely mechanism. Notably, one of these
clusters of heterozygous SNVs is comprised
of two new SNVs (not in dbSNP) and could
be further evidence of a non-uniform distri-
bution of new mutations similar to what we
observed for de novo mutations. In addition,
we observed an excess of heterozygous bases
at dbSNP positions in autozygous segments
(n = 22), most of which were not clustered with other heterozygous
variants (16 of 22) but may also be evidence of recent gene conver-
sion events.
Notably, we observed a non-uniform distribution (three clusters
with two de novo SNVs within 10 kb; range 7–3,921 bp) (empiri-
cal P = 6 × 10−6) among the validated de novo events. One of these
clusters contained SNVs that were 7 bp apart, suggesting a com-
pound mutational event24; on the basis of this event, 0.97% of de
novo mutations were calculated to be part of multinucleotide
mutations (95% CI (Wilson method) = 0.27–3.5%). Although
this estimate is somewhat lower than the estimate of 2–3% of de novo
mutations in compound mutational events, which was estimated on
the basis of whole-genome sequencing data from two trios24, the
本文档为【estimating the human mutation rate using autozygosity in a founder population】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。