首页 生物信息学讲课三-20020925

生物信息学讲课三-20020925

举报
开通vip

生物信息学讲课三-20020925null三、完整基因组的比较研究是一个新方向 研究生命是从哪里起源的?生命是如何进化的?遗传密码是如何起源的?估计最小独立生活的生物至少需要多少基因,这些基因是如何使它们活起来的?比如,鼠和人的基因组大小相似,都含有约三十亿碱基对,基因的数目也类似。可是鼠和人差异确如此之大,这是为什么?同样,有的科学家估计不同人种间基因组的差别仅为 0.1%;人猿间差别约为1%。但他们表型间的差异十分显著。 这又为什么? 完整基因组序列的比较研究是解决这些问题的重要途径。三、完整基因组的比较研究是一个新方向 研究生命是从哪里起...

生物信息学讲课三-20020925
null三、完整基因组的比较研究是一个新方向 研究生命是从哪里起源的?生命是如何进化的?遗传密码是如何起源的?估计最小独立生活的生物至少需要多少基因,这些基因是如何使它们活起来的?比如,鼠和人的基因组大小相似,都含有约三十亿碱基对,基因的数目也类似。可是鼠和人差异确如此之大,这是为什么?同样,有的科学家估计不同人种间基因组的差别仅为 0.1%;人猿间差别约为1%。但他们表型间的差异十分显著。 这又为什么? 完整基因组序列的比较研究是解决这些问题的重要途径。三、完整基因组的比较研究是一个新方向 研究生命是从哪里起源的?生命是如何进化的?遗传密码是如何起源的?估计最小独立生活的生物至少需要多少基因,这些基因是如何使它们活起来的?比如,鼠和人的基因组大小相似,都含有约三十亿碱基对,基因的数目也类似。可是鼠和人差异确如此之大,这是为什么?同样,有的科学家估计不同人种间基因组的差别仅为 0.1%;人猿间差别约为1%。但他们表型间的差异十分显著。 这又为什么? 完整基因组序列的比较研究是解决这些问题的重要途径。The distribution of mouse homology genes in the human chromosome (Data from GenBank,Coordinate by R.S.Chen) The distribution of mouse homology genes in the human chromosome (Data from GenBank,Coordinate by R.S.Chen) ************************************************************************* genes in this No. chromosome of distribution of mouse homology genes mouse in human chromosome 1 1、2、5、6、8、13、18 2 2、7、9、10、11、15、20 3 1、3、4、8 4 1、6、8、9 5 1、4、7、12、13、18、22 6 2、3、7、10、12 7 6、10、11、15、16、19 8 1、4、8、13、16、19 9 3、6、11、15、19 10 6、10、12、19、21、22 11 2、5、7、16、17、22 12 2、7、14 13 1、5、6、7、9、15、17 14 3、8、10、13、14、X 15 5、8、12、22 16 3、8、16、21、22 17 6、16、19、21 18 5、10、18 19 9、10、11、X X X *********************************************************************** Study on conservation of gene order in complete genomes Study on conservation of gene order in complete genomes. We analyzed the gene order of 70 ribosomal proteins in 16 complete genomes. These genes would form 9-14 operons in each genome. The results show that: (1) there are more that 20 ribosomal proteins contained in rpL3 and rpL4 operons, the gene order of these genes are very conserved in both Eu-bacteria and Archae-bacteria; (2) some operons’ structure are special to Eu-bacteria and Archae-bacteria respectively; (3) in each kingdom, some difference of gene order in difference species could be used to infer the evolutionary relationship of these species. This method provides a new way to study the evolutionary relationship of those old species. * chromosome 13 are relatively stable, for instance, whereas chromosome 12 in men and chromosome 16 in women are enormously fickle. * why vertebrates have four times as many HOX genes, a group of key developmental genes, as do fruit flies. * chromosome 13 are relatively stable, for instance, whereas chromosome 12 in men and chromosome 16 in women are enormously fickle. * why vertebrates have four times as many HOX genes, a group of key developmental genes, as do fruit flies. null 四、基于序列数据的生物进化研究当前面临的问题 自1859年 Darwin 的物种起源 (Origin of Species) 发表以来,进化论成为对人类自然科学和自然哲学发展的最重大贡献之一。 进化论研究的核心是描述生物进化的历史(系统进化树)和探索进化过程的机制。自本世纪中叶以来,随着分子生物学的不断发展,进化论的研究也进入了分子水平。当前分子进化的研究已是进化论研究的重要手段,并建立了一套依赖于核酸、蛋白质序列信息的理论 方法 快递客服问题件处理详细方法山木方法pdf计算方法pdf华与华方法下载八字理论方法下载 。nulll    序列相似性比较。就是将待研究序列与DNA或蛋白质序列库进行比较,用于确定该序列的生物属性,也就是找出与此序列相似的已知序列是什么。完成这一工作只需要使用两两序列比较算法。常用的程序包有BLAST、FASTA等; l        序列同源性分析。是将待研究序列加入到一组与之同源,但来自不同物种的序列中进行多序列同时比较,以确定该序列与其它序列间的同源性大小。这是理论分析方法中最关键的一步。完成这一工作必须使用多序列比较算法。常用的程序包有CLUSTAL等; l        构建系统进化树。根据序列同源性分析的结果,重建反映物种间进化关系的进化树。为完成这一工作已发展了多种软件包,象PYLIP、MEGA等; l        稳定性检验。为了检验构建好的进化树的可靠性,需要进行统计可靠性检验,通常构建过程要随机地进行成百上千次,只有以大概率(70%以上)出现的分支点才是可靠的。通用的方法使用 Bootstrap算法,相应的软件已包括在构建系统进化树所用的软件包当中。为便于使用者查找表三给出了进化分析相关软件的因特网地址。null进化分析相关软件的因特网地址 ******************************************************** 序列分析和多序列比较 # BLAST Web site http://www.ncbi.nlm.nih.gov/BLAST/ # FASTA at EBI http://www2.ebi.ac.uk/fasta3/ # CLUSTALW software ftp://ftp-igbmc.u-strasbg.fr/pub/ClustalW # HMMER software http://hmmer.wustl.edu/ # SAM profile software http://www.cse.ucsc.edu/research/compbio/sam.html # BCM Search Launcher http://kiwi.imgen.bcm.tmc.edu:8088/searchlauncher/launcher.html 系统进化树构建和稳定性分析 # PHYLIP http://evolution.genetics.washington.edu/phylip.html # Hennig86 http://www.vims.edu/~mes/hennig/software.html # MEGA/METREE http://www.bio.psu.edu/faculty/nei/imeg # GAMBIT http://www.lifesci.ucla.edu/mcdbio/Faculty/Lake/Research/Programs/ # MacClade http://phylogeny.arizona.edu/macclade/macclade.html # PAUP http://onyx.si.edu/PAUP/ # GCG software package http://www.gcg.com/ *******************************************************     human genome shares 223 genes with bacteria--genes that do not exist in the worm, fly, or yeast. A reticulated tree, or net, which might more appropriately represent life's history. human genome shares 223 genes with bacteria--genes that do not exist in the worm, fly, or yeast. A reticulated tree, or net, which might more appropriately represent life's history. null More and more LGT(Lateral Gene Transfer ) were discovered and reported. Some people guess 1.5%~14.5% of genes in a genome are related with LGT, even rRNA molecules are involved in LGT; Garcia-Vallvé S, Romeu A, Palau J. ,Genome Res, 2000, 11, 1719~1725 Yap W H, Zhang Z, Wang Y. , J. Bacteriol. 1999, 181: 5201~5209 Some people argue it is impossible to reconstruct a universal life tree; Pennisi E. ,Science, 1999, 284: 1305~1307 Doolittle R F.,Nature, 1998, 392: 339~342 As more and more whole genome sequence and the related data become available, it is possible to re-consider the phylogeny and clustering properties of species in more broad measurements, even in level of whole genome. Phylogeny Based on Whole Genome as inferred from Complete Information Set Analysis (CISA) Phylogeny Based on Whole Genome as inferred from Complete Information Set Analysis (CISA) we present a new method based on information theory to calculate the phylogenic distance between biological sequences, including 16s Ribosomal RNA, which is used for method proof-test, 24 completely sequenced genomes, as well as all predicted ORF products of them, creating Phylogeny of genome and proteome using neighboring-joining algorithm. Scientists have already been conscious of that no other biological sequence can bring more phylogenetic information than the genome. However, previous algorithms don’t have the ability to handle such megabase level nucleic acid or amino acid sequences, whose length sizes are in most cases unequal. Methods Methods Letbe an alphabet of m symbols, and suppose is a set of sequences formed from the symbol set .We denote the set of all different sequences formed fromwith lengthl by; then the number m(l) of all sequences of equals. For a sequence, let be its length anddenote the number of contiguous subsequences in Sk which match the i-th sequence of l Lk. It is easy to see that ,for each l Lk and k.Letting, we obtain a distribution whereThus, for each sequence , we can get a unique set of distributions null. This set contains all primary information of a sequence: in particular, uniquely determines the original sequence, so we call this set a complete information set of the sequence . A function of measuring of information discrepancy has been introduced (abbreviated as FDOD)9,11 To develop a discrepancy measure of sequences, a measure based on the FDOD function10 is as follow:where is defined as 0 as in the Kullback-Leiber entropy(Kullback, 1959); s denotes the number of the sequences; l denotes the window size. The FDOD function is characterized by a axiom set similar to Shannon’s axioms: non-negativity, symmetry, continuity, nullidentity and symmetric recursiveness. For s distributions ( ), this FDOD function also has the following properties: boundedness, maximum, convexity, monotonicity, and so on. Meanwhile, it’s easy to see that, using this measure, the maximum discrepancy between any two sequences is less than or equal to 1, while the minimum one is equal to 0. Phylogeny of 23 completely sequenced Bacteria and Archaea species on the basis of 16s rRNA. A) Phylogenetic tree built by our new method. B) Phylogenetic tree built by Clustalw program. Phylogeny of 23 completely sequenced Bacteria and Archaea species on the basis of 16s rRNA. A) Phylogenetic tree built by our new method. B) Phylogenetic tree built by Clustalw program. Phylogeny of 24 completely sequenced Bacteria, Archaea and Eukarya species. A)genomic tree. Phylogeny of 24 completely sequenced Bacteria, Archaea and Eukarya species. A)genomic tree. nullPhylogeny of T.tengcongensis based on Whole GenomeThe Composition of Proteins with different functions(COG)in a Whole Proteome Reveals the Organism’s Phylogeny and Clustering PropertiesThe Composition of Proteins with different functions(COG)in a Whole Proteome Reveals the Organism’s Phylogeny and Clustering PropertiesThe composition differences of proteins of COG classes for 36 organismsThe composition differences of proteins of COG classes for 36 organismsMethodMethod We took the 17 functional classes of COGs (Clusters of Orthologous Groups) as the basic classes of protein functions and constructed a 17-D protein_vector to describe the potential functions of the protein. By summing up all protein_vectors belonging to the proteome and then normalizing it, we got a 17-D “Proteome_Vector” reflecting the composition of proteins of different functions in the proteome. By regarding this kind of 17-D Proteome_Vectors as “characteristic vectors” of the organisms, we investigated the clustering properties and phylogeny relationships of the 36 species (8 Archaea、 24 Bacteria and 4 Eukarya) whose genome sequences and related annotations are available at that time. The average distances of Proteome_Vector The average distances of Proteome_Vector Intra-kingdom species Archaea:22.79 Bacteria:40.73 Eukarya:19.91 Inter-kingdom species Archaea-Bacteria: 43.23 Archaea-Eukarya: 39.97 Bacteria-Eukarya: 45.67 nullnull谢谢大家!谢谢大家!
本文档为【生物信息学讲课三-20020925】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_604486
暂无简介~
格式:ppt
大小:818KB
软件:PowerPoint
页数:0
分类:
上传时间:2012-06-20
浏览量:35