Methods in comparative genomics: Genome correspondence, gene identification and regulatory motif discovery

被引:61
|
作者
Kellis, M
Patterson, N
Birren, B
Berger, B
Lander, ES
机构
[1] MIT, Dept Math, Cambridge, MA 02139 USA
[2] MIT, Dept Biol, Cambridge, MA 02139 USA
[3] MIT, Whitehead Inst Ctr Genome Res, Cambridge, MA 02139 USA
[4] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
关键词
comparative genomics; computational biology; yeast; Saccharomyces cerevisiae; genome alignment; gene finding; gene identification; gene regulation; regulatory motifs; motif discovery;
D O I
10.1089/1066527041410319
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In Kellis et aL (2003), we reported the genome sequences of S. paradoxus, S. mikatae, and S. bayanus and compared these three yeast species to their close relative, S. cerevisiae. Genomewide comparative analysis allowed the identification of functionally important sequences, both coding and noncoding. In this companion paper we describe the mathematical and algorithmic results underpinning the analysis of these genomes. (1) We present methods for the automatic determination of genome correspondence. The algorithms enabled the automatic identification of orthologs for more than 90% of genes and intergenic regions across the four species despite the large number of duplicated genes in the yeast genome. The remaining ambiguities in the gene correspondence revealed recent gene family expansions in regions of rapid genomic change. (2) We present methods for the identification of protein-coding genes based on their patterns of nucleotide conservation across related species. We observed the pressure to conserve the reading frame of functional proteins and developed a test for gene identification with high sensitivity and specificity. We used this test to revisit the genome of S. cerevisiae, reducing the overall gene count by 500 genes (10% of previously annotated genes) and refining the gene structure of hundreds of genes. (3) We present novel methods for the systematic de novo identification of regulatory motifs. The methods do not rely on previous knowledge of gene function and in that way differ from the current literature on computational motif discovery. Based on genomewide conservation patterns of known motifs, we developed three conservation criteria that we used to discover novel motifs. We used an enumeration approach to select strongly conserved motif cores, which we extended and collapsed into a small number of candidate regulatory motifs. These include most previously known regulatory motifs as well as several noteworthy novel motifs. The majority of discovered motifs are enriched in functionally related genes, allowing us to infer a candidate function for novel motifs. Our results demonstrate the power of comparative genomics to further our understanding of any species. Our methods are validated by the extensive experimental knowledge in yeast and will be invaluable in the study of complex genomes like that of the human.
引用
收藏
页码:319 / 355
页数:37
相关论文
共 50 条
  • [21] Analysis of gene regulatory sequences by knowledge discovery methods
    Pozdnyakov, M. A.
    Orlov, Yu L.
    Vishnevsky, O. V.
    Proscura, A. L.
    Vityaev, E. E.
    Arrigo, P.
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON BIOINFORMATICS OF GENOME REGULATION AND STRUCTURE, VOL 1, 2004, : 170 - 173
  • [22] Schistosoma comparative genomics: integrating genome structure, parasite biology and anthelmintic discovery
    Swain, Martin T.
    Larkin, Denis M.
    Caffrey, Conor R.
    Davies, Stephen J.
    Loukas, Alex
    Skelly, Patrick J.
    Hoffmann, Karl F.
    TRENDS IN PARASITOLOGY, 2011, 27 (12) : 555 - 564
  • [23] Eukaryotic regulatory element conservation analysis and identification using comparative genomics
    Liu, YY
    Liu, XS
    Wei, LP
    Altman, RB
    Batzoglou, S
    GENOME RESEARCH, 2004, 14 (03) : 451 - 458
  • [24] Comparative genomics of Steinernema reveals deeply conserved gene regulatory networks
    Adler R. Dillman
    Marissa Macchietto
    Camille F. Porter
    Alicia Rogers
    Brian Williams
    Igor Antoshechkin
    Ming-Min Lee
    Zane Goodwin
    Xiaojun Lu
    Edwin E. Lewis
    Heidi Goodrich-Blair
    S. Patricia Stock
    Byron J. Adams
    Paul W. Sternberg
    Ali Mortazavi
    Genome Biology, 16
  • [25] Genome-wide identification of replication origins in yeast by comparative genomics
    Nieduszynski, Conrad A.
    Knox, Yvonne
    Donaldson, Anne D.
    GENES & DEVELOPMENT, 2006, 20 (14) : 1874 - 1879
  • [26] Comparative genomics of Steinernema reveals deeply conserved gene regulatory networks
    Dillman, Adler R.
    Macchietto, Marissa
    Porter, Camille F.
    Rogers, Alicia
    Williams, Brian
    Antoshechkin, Igor
    Lee, Ming-Min
    Goodwin, Zane
    Lu, Xiaojun
    Lewis, Edwin E.
    Goodrich-Blair, Heidi
    Stock, S. Patricia
    Adams, Byron J.
    Sternberg, Paul W.
    Mortazavi, Ali
    GENOME BIOLOGY, 2015, 16
  • [27] PLAZA: A Comparative Genomics Resource to Study Gene and Genome Evolution in Plants
    Proost, Sebastian
    Van Bel, Michiel
    Sterck, Lieven
    Billiau, Kenny
    Van Parys, Thomas
    Van de Peer, Yves
    Vandepoele, Klaas
    PLANT CELL, 2009, 21 (12): : 3718 - 3731
  • [28] Software for analysis of gene regulatory sequences by knowledge discovery methods
    Vityaev, R
    Shipilov, TI
    Pozdnyakov, MA
    Vishnevsky, OV
    Proscura, AL
    Orlov, YL
    Arrigo, P
    BIOINFORMATICS OF GENOME REGULATION AND STRUCTURE II, 2006, : 491 - 498
  • [29] Comparative genomics in erythropoletic gene discovery: Synergisms between the Antarctic icefishes and the zebrafish
    Detrich, HW
    Yergeau, DA
    ZEBRAFISH:2ND EDITION GENETICS GENOMICS AND INFORMATICS, 2004, 77 : 475 - +
  • [30] Resequencing and Comparative Genomics of Stagonospora nodorum: Sectional Gene Absence and Effector Discovery
    Syme, Robert Andrew
    Hane, James K.
    Friesen, Timothy L.
    Oliver, Richard P.
    G3-GENES GENOMES GENETICS, 2013, 3 (06): : 959 - 969