首页 estimating the human mutation rate using autozygosity in a founder population

estimating the human mutation rate using autozygosity in a founder population

举报
开通vip

estimating the human mutation rate using autozygosity in a founder population © 20 12 N at ur e A m er ic a, In c. A ll rig ht s re se rv ed . Nature GeNetics  ADVANCE ONLINE PUBLICATION � l e t t e r s Knowledge of the rate and pattern of new mutation is critical  to the understanding of human disease and evol...

estimating the human mutation rate using autozygosity in a founder population
© 20 12 N at ur e A m er ic a, In c. A ll rig ht s re se rv ed . Nature GeNetics  ADVANCE ONLINE PUBLICATION � l e t t e r s Knowledge of the rate and pattern of new mutation is critical  to the understanding of human disease and evolution. We used   extensive autozygosity in a genealogically well-defined  population of Hutterites to estimate the human sequence  mutation rate over multiple generations. We sequenced  whole genomes from 5 parent-offspring trios and identified  44 segments of autozygosity. Using the number of meioses  separating each pair of autozygous alleles and the 72 validated  heterozygous single-nucleotide variants (SNVs) from 5�2 Mb of  autozygous DNA, we obtained an SNV mutation rate of �.20 ×   �0−8 (95% confidence interval 0.89–�.43 × �0−8) mutations  per base pair per generation. The mutation rate for bases within  CpG dinucleotides (9.72 × �0−8) was 9.5-fold that of non-CpG  bases, and there was strong evidence (P = 2.67 × �0−4) for a  paternal bias in the origin of new mutations (85% paternal).  We observed a non-uniform distribution of heterozygous SNVs  (both newly identified and known) in the autozygous segments  (P = 0.00�), which is suggestive of mutational hotspots or sites  of long-range gene conversion. Various approaches have provided a wide range of SNV mutation rate estimates (1–3 × 10−8 mutations per base pair per generation). Early studies of mutation rates in humans focused on specific loci or the de novo incidence of disease1–4. More recent studies have leveraged whole-genome sequencing data on a total of three nuclear families to estimate de novo mutation rates for SNVs of approximately 1 × 10−8 mutations per base pair per generation5,6. Comparative studies of chimpanzee and human genomes provided higher estimates (for instance, 2.5 × 10−8) but are highly contingent on uncertainty about the number of generations since human-chimpanzee divergence7. In contrast to studies that are focused on identifying new muta- tions arising in a single generation, the examination of populations with a small number of founding individuals is ideal for estimating mutation rates across a small number of generations. The Hutterites are a population of Anabaptist farmers living on the plains of the United States and Canada who are descended from a small group of founders (<90 individuals). The genealogy of this group is completely known, and genome-wide SNP genotype data have been collected from over 1,400 individuals who are related to each other in a 13-generation pedigree descended from 64 founders8,9. Due to increased levels of consanguinity, Hutterite individuals carry large segments of the genome that are autozygous or homozygous by recent decent10. The alleles in an autozygous segment are descended from a recent common ancestor and have accumulated mutations in the generations since transmission from this individual. Estimating the human mutation rate using autozygosity in a founder population Catarina D Campbell1, Jessica X Chong2, Maika Malig1, Arthur Ko1, Beth L Dumont1, Lide Han2, Laura Vives1, Brian J O’Roak1, Peter H Sudmant1, Jay Shendure1, Mark Abney2, Carole Ober2,3 & Evan E Eichler1,4 1Department of Genome Sciences, University of Washington, Seattle, Washington, USA. 2Department of Human Genetics, The University of Chicago, Chicago, Illinois, USA. 3Department of Obstetrics and Gynecology, The University of Chicago, Chicago, Illinois, USA. 4Howard Hughes Medical Institute, Seattle, Washington, USA. Correspondence should be addressed to E.E.E. (eee@gs.washington.edu). Received 6 June; accepted 30 August; published online 23 September 2012; doi:10.1038/ng.2418 3 5 Generations 2 4 1 3 8 5 Figure 1 Relationship of sequenced individuals. Simplified pedigree showing the relationship between the 15 sequenced individuals. Black symbols represent the children in the five trios, and gray symbols represent their parents. Founders are connected by blue lines, with the shade of blue indicating the number of generations separating the connected individuals. For clarity, only the shortest relationships between each individual and the parents of that individual are shown. The color scale represents the number of generations separating the individuals, where darker blue indicates fewer generations and lighter blue indicates more generations. © 20 12 N at ur e A m er ic a, In c. A ll rig ht s re se rv ed . 2  ADVANCE ONLINE PUBLICATION Nature GeNetics l e t t e r s We selected five Hutterite parent-offspring trios for whole-genome sequencing, with the parents in each trio being related to each other by 6–8 (mean of 6.6) meiotic transmissions (Fig. 1). We performed whole-genome sequencing of DNA isolated from whole blood using Illumina paired-end sequencing, generating 775 Gb of sequence with an average of 13-fold coverage per individual (Supplementary Table 1). The sequencing reads for each sample were aligned to the human reference genome (NCBI Build 36). We identified a total of 5.4 million SNVs on the basis of the intersection of variant calls from 2 different algorithms11,12 (Supplementary Table 2). The SNP genotypes from whole-genome sequencing were highly concordant to those generated by SNP microarray (mean genotype concordance of 99.7%) (Supplementary Table 2). We identified extended regions of homozygosity in the offspring of the five trios and in five previously sequenced genomes (three European-Americans and two Yoruba)13 (Online Methods). The extent of homozygosity was correlated to the inbreeding coefficients of the Hutterite individuals (Supplementary Fig. 1, Supplementary Table 3 and Supplementary Note). As expected, the five Hutterite probands showed significantly greater autozygosity (223 Mb on average per individual) than other European-American individuals (95 Mb) or the Yoruba individuals (4 Mb) (Fig. 2 and Supplementary Table 3). Although the amount of short homozygous segments was 0.5 1.0 2.0 5.0 10.0 20.0 50.0 Size of autozyogus segments (Mb) C ou nt Hutterite (n = 5) European-American (n = 3) YRI (n = 2) 50 20 5 Total Mb in bin 0 20 40 60 80 100Figure 2 Elevated autozygosity in the Hutterite individuals. Autozygous segments were binned by size for the five Hutterite individuals, three European-American individuals and two Yoruba individuals (YRI). The x axis represents bins of autozygous segments of different size, and the y axis shows the number of segments in each bin. In each bin, individuals are represented by ‘bubbles’, with the size of the bubble denoting the total amount of genomic sequence in that bin. Chr. 2 Individual 1 Individual 2 Individual 3 Individual 4 Individual 5 JF NA12891 NA12892 NA19238 NA19239 20 Mb 180,000,000 190,000,000 200,000,000 210,000,000 220,000,000 230,000,000 Homozygous Heterozygous Carrier of autozygous segment SNP microarray data available MRCAs (8 generations) Ancestor of individual 3 Path of maternal inheritance Path of paternal inheritance Same individual Other common ancestors (mean generations = 15.5) a b Figure 3 Determination of the MRCA for an autozygous segment. (a) A 54-Mb autozygous segment on chromosome 2 in individual 3. Genomic coordinates (hg18) are given on the horizontal axis, and each individual is represented on the vertical axis, including the five Hutterite individuals, the three European-Americans and the two Yoruba. Each SNV is represented by a vertical bar that is colored blue if the variant is homozygous and green if it is heterozygous. The autozygous segment in individual 3 is boxed in orange. (b) Determination of the MRCA for the autozygous segment in individual 3. The pedigree containing all the haplotype carriers of the autozygous haplotype is shown. Individual 3 is shown in yellow. Haplotype carriers have two MRCAs (boxed) as well as additional common ancestors further up the pedigree. The paths from these individuals to the autozygous subject are shown in red for the maternal ancestors and blue for the paternal ancestors; all ancestors of the individual are marked with a star. Black dashed lines represent relationships to common ancestors further back in the pedigree. © 20 12 N at ur e A m er ic a, In c. A ll rig ht s re se rv ed . Nature GeNetics  ADVANCE ONLINE PUBLICATION 3 l e t t e r s similar in the Hutterite individuals and the other European-American individuals, we observed 33-fold more autozygous base pairs in segments of greater than 2 Mb in length in the Hutterite individuals (Fig. 2 and Supplementary Note). We further refined and validated segments that were longer than 5 Mb by comparing to autozygous segments identified in SNP microarray data for the same samples10,14 to obtain a final list of 44 regions of autozygosity (6–12 segments per individual; 5–54 Mb in length). We restricted subsequent analy- ses to these 512 Mb of autozygous DNA (Supplementary Fig. 2 and Supplementary Table 4). We determined the number of meioses separating each allele within each autozygous segment. The small founding population and complex genealogy of the Hutterite population (Fig. 1) made this potentially problematic because of the large number of shared common ances- tors and multiple paths of descent between any ancestor-descendant pair. To resolve the ancestry of the autozygous segments, we com- bined the pedigree structure and genome-wide SNP genotype data9 to identify the most recent common ancestors (MRCAs) on the basis of segregation within the Hutterite genealogy8 (Fig. 3, Supplementary Fig. 3 and Supplementary Note). Using the identified MRCAs, we estimated that the 2 haplotypes of the 44 autozygous segments were separated by 8–18 meioses (Supplementary Table 4). To calculate the SNV mutation rate, we identified heterozygous SNVs within each autozygous segment, excluding regions of common repeats, segmental duplication and known SNPs (found in dbSNP132). We validated 72 SNVs as heterozygous by Sanger-based capillary sequencing (Table 1, Supplementary Table 5 and Supplementary Note). We calculated an SNV mutation rate (µ) of 1.20 × 10−8 (95% confidence interval (CI) = 0.89–1.43 × 10−8) mutations per base pair per generation. We observed consistent µ values across the five trios, with values ranging between 0.92 × 10−8 and 1.51 × 10−8 (Fig. 4 and Table 1). Among these mutations, we observed an excess of transi- tions relative to transversions, resulting in a Ti/Tv ratio of 1.64 that was not significantly different from the genome-wide SNV ratio of 2.17 (two-tailed χ-squared P = 0.27). Twelve of the 72 validated het- erozygous SNVs (16.7%) mapped to CpG dinucleotides. We calculated a µ value for CpG sites of 9.72 × 10−8 mutations per CpG base pair per generation, which is 9.5× greater than the µ value for non-CpG bases (1.02 × 10−8). We also estimated the mutation rate on the basis of de novo mutations in the most recent generation (Supplementary Table 6 and Supplementary Note). Using 176 validated de novo SNVs, we calculated a mutation rate of 0.96 × 10−8 (95% CI = 0.82– 1.09 × 10−8) mutations per base pair per generation; although this rate is lower than the one calculated using autozygosity, the confidence intervals of these rates overlap (Fig. 4). We identified and validated one potential gene conversion event involving paralogs of segmental duplications containing the genes C4A and C4B in a region where lower copy number has been associated with lupus15. Although individual 4 had a total diploid copy number of six for this CNV, we determined that the sequence content of the two alleles differed (Supplementary Fig. 4), likely as a result of gene conversion between paralogous copies of, at a minimum, the TNXA and TNXB genes (6 kb). Both theoretical and experimental analyses have predicted that the male germline contrib- utes disproportionately to de novo mutations compared to the female germline7,16,17. However, a recent analysis on two parent- offspring trios reported a paternal bias in mutation in one trio and a maternal bias in the other5. Given the complexity of the Hutterite pedigree and transmissions through multiple female and male ancestors, we focused on the putative genome-wide de novo mutations in the most recent generation. We used molecular phasing5,12,17 to determine the parental origin of 26 of the 176 validated de novo SNVs and found that 84.6% (22 of 26; 95% CI = 70.8–98.5%) of de novo SNPs originated on the paternal haplotype, confirming a male bias for new SNVs (two-tailed binomi- nal P = 2.67 × 10−4). One advantage of using autozygosity in the identification of recent mutations is the ability to identify potential gene conversion events between homologous chromosomes. Such events could lead to clus- ters of heterozygous SNVs (including known SNPs) within regions of autozygosity, and we identified four clusters (with two or more SNVs mapping within 10 kb of each other) (Table 2). One of these clusters is 309 kb in length, suggesting that it most likely arose as a product of crossover events18. Excluding this large cluster, the average distance between heterozygous SNPs in the remaining three clusters was 2,723 bp (range of 7–7,839 bp). We tested this distribution by simu- lation (n = 10,000 replicates) and determined that there was a signi- ficant excess of ‘clustered’ SNVs compared to that expected with a random distribution of variants (empirical P = 0.001). We also tested whether the de novo SNVs in the most recent generation were uniformly distributed in the genome. Notably, we observed three clusters of validated de novo variants (Table 2 and Supplementary Table 5) and a significant excess (empirical P = 6 × 10−6) of de novo SNVs in close proximity (<10 kb) using simulations (n = 1,000,000 replicates). table 1 sNV mutation rates determined from segments of autozygosity Individual Segments (>5 Mb) Total callable (Mb)a Mean meioses (MRCA)b SNVsc SNV µ 95% CId 1 7 63.4 13.8 13 1.51 × 10−8 0.62–2.28 × 10−8 2 6 55.9 13.8 7 0.92 × 10−8 0.17–1.72 × 10−8 3 9 124.8 9.9 13 1.07 × 10−8 0.45–1.63 × 10−8 4 10 147.6 12.0 19 1.09 × 10−8 0.56–1.55 × 10−8 5 12 120.8 12.0 20 1.40 × 10−8 0.73–1.96 × 10−8 All 44 512.4 11.9 72 1.20 × 10−8 0.89–1.43 × 10−8 aNon-segmental duplication, non-simple repeat and non-dbSNP132 variants with at least six mapped reads. bWeighted by length of segment. cValidated as newly identified, heterozygous variants. dBased on a Poisson distribution. 0 0.5 1.0 1.5 2.0 2.5 1 2 3 4 5 Individual All S N V � ( × 1 0– 8 ) Autozygosity Last generation Figure 4 SNV mutation rate estimates. The SNV mutation rate point estimates are shown for each individual and all five individuals combined, with the error bars representing the 95% CIs that were generated on the basis of a Poisson distribution. SNP µ is the number of SNV mutations per base pair per generation. Filled diamonds represent estimates from autozygous segments, and open diamonds represent estimates from SNVs identified in the most recent generation. © 20 12 N at ur e A m er ic a, In c. A ll rig ht s re se rv ed . 4  ADVANCE ONLINE PUBLICATION Nature GeNetics l e t t e r s There has recently been much interest in using massively parallel sequencing data to obtain an accurate estimate of the mutation rate using nuclear families5,6. We developed an approach using extended regions of autozygosity to discover new mutations that have emerged within a few generations. Compared to analyses focused on de novo mutations in a single generation, our approach significantly reduces the number of false positives and somatic mutations, as most mutations in autozygous segments are transmitted from one of the parents. In addition, given the relationship between paternal age and the number of de novo mutations17,19, our approach reduces this confounding effect by yielding an average mutation rate over 8–18 meioses. Disadvantages include uncertainty about the ancestry of the autozygous segments (Supplementary Note), the smaller genomic ‘search space’ (512 Mb), the potential to confound new mutation and gene conversion events and increased potential for purifying selection to eliminate a small fraction of new mutations, although the frac- tion of such events should be negligible20. We have tried to reduce the confounding effect of gene conversion by limiting our analysis to newly identified SNVs. We estimated an SNV mutation rate of 1.20 × 10−8 mutations per base pair per generation using autozygous segments, which is higher than the rate of 0.96 × 10−8 that we esti- mated for the most recent generation and the rate of 1.1 × 10−8 that was previously published for the whole genome5,6 yet lower than the rate estimated in a recent resequencing study21. While this manuscript was under review, two additional stud- ies characterizing the human mutation rate were published. First, a sequence mutation rate of 1.2 × 10−8 was calculated from an analysis of whole-genome sequencing of over 70 trios19, which is equal to the rate obtained in our analysis, suggesting the accuracy of using autozygosity to estimate the mutation rate. Notably, the reported quantification of the correlation between the number of mutations and paternal age19 suggests that the relatively young age of the fathers of the trios analyzed here (21–30 years at the time of childbirth) may provide an explanation for the lower mutation rate we observed in the most recent generation. In a second publication, an inferred sequence mutation rate of 1.82 × 10−8 was calculated by modeling population genetic parameters on the basis of the mutational properties of micro- satellites22; the differences between this estimate and our estimates are likely due to differences in methodology. We observed a non-uniform distribution of a small fraction of mutations within autozygous segments that seem to provide evi- dence of recent allelic gene conversion. We observed three clusters of variants that were unlikely to have been generated by crossover mechanisms and might represent potential allelic gene conversion events in the autozygous segments, although one cluster (11 kb) was larger than expected for typical gene conversion events23. Only one of the ten SNVs in these clusters was in a CpG dinucleotide, and the GC content (0.36–0.50) of these three regions was not consistent with a model of recur- rent mutation due to CpG methylation and demethylation. The average distance between heterozygous SNPs in these clusters was 2,723 bp, ruling out compound mutation24 as a likely mechanism. Notably, one of these clusters of heterozygous SNVs is comprised of two new SNVs (not in dbSNP) and could be further evidence of a non-uniform distri- bution of new mutations similar to what we observed for de novo mutations. In addition, we observed an excess of heterozygous bases at dbSNP positions in autozygous segments (n = 22), most of which were not clustered with other heterozygous variants (16 of 22) but may also be evidence of recent gene conver- sion events. Notably, we observed a non-uniform distribution (three clusters with two de novo SNVs within 10 kb; range 7–3,921 bp) (empiri- cal P = 6 × 10−6) among the validated de novo events. One of these clusters contained SNVs that were 7 bp apart, suggesting a com- pound mutational event24; on the basis of this event, 0.97% of de novo mutations were calculated to be part of multinucleotide mutations (95% CI (Wilson method) = 0.27–3.5%). Although this estimate is somewhat lower than the estimate of 2–3% of de novo mutations in compound mutational events, which was estimated on the basis of whole-genome sequencing data from two trios24, the
本文档为【estimating the human mutation rate using autozygosity in a founder population】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_775359
暂无简介~
格式:pdf
大小:695KB
软件:PDF阅读器
页数:0
分类:
上传时间:2012-10-02
浏览量:15