Discovering motifs in ranked lists of DNA sequences

被引:512
|
作者
Eden, Eran [1 ]
Lipson, Doron
Yogev, Sivan
Yakhini, Zohar
机构
[1] Technion Israel Inst Technol, Dept Comp Sci, IL-32000 Haifa, Israel
[2] IBM Res Labs, Haifa, Israel
[3] Agilent Labs, Santa Clara, CA USA
关键词
D O I
10.1371/journal.pcbi.0030039
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP-chip (chromatin immuno-precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP-chip and CpG methylation data and obtained the following results. (i) Identification of 50 novel putative transcription factor (TF) binding sites in yeast ChIP-chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii) Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked. Overall, we demonstrate that the statistical framework embodied in the DRIM software tool is highly effective for identifying regulatory sequence elements in a variety of applications ranging from expression and ChIP-chip to CpG methylation data. DRIM is publicly available at http:// bioinfo.cs.technion.ac.il/drim.
引用
收藏
页码:508 / 522
页数:15
相关论文
共 50 条
  • [41] New scoring schema for finding motifs in DNA Sequences
    Fatemeh Zare-Mirakabad
    Hayedeh Ahrabian
    Mehdei Sadeghi
    Abbas Nowzari-Dalini
    Bahram Goliaei
    BMC Bioinformatics, 10
  • [42] Evaluation Measures for Relevance and Credibility in Ranked Lists
    Lioma, Christina
    Simonsen, Jakob Grue
    Larsen, Birger
    ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 91 - 98
  • [43] An Inference and Integration Approach for the Consolidation of Ranked Lists
    Schimek, Michael G.
    Mysickova, Alena
    Budinska, Eva
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2012, 41 (07) : 1152 - 1166
  • [44] Network Selection: A Method for Ranked Lists Selection
    Cutillo, Luisa
    Carissimo, Annamaria
    Figini, Silvia
    PLOS ONE, 2012, 7 (08):
  • [45] Learning to Truncate Ranked Lists for Information Retrieval
    Wu, Chen
    Zhang, Ruqing
    Guo, Jiafeng
    Fan, Yixing
    Lan, Yanyan
    Cheng, Xueqi
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 4453 - 4461
  • [46] Discovering tatterns in microsatellite flanks with evolutionary computation by evolving discriminatory DNA motifs
    Meade, A
    Corne, D
    Sibly, R
    CEC'02: PROCEEDINGS OF THE 2002 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1 AND 2, 2002, : 1 - 6
  • [47] Finding Motifs in A Set of DNA Sequences: A Dynamic Programming Approach
    Li, Zhen-Hao
    Zheng, Xiao-Juan
    Guan, Ji-Wen
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 198 - +
  • [48] The Improvement and Implementation in the Algorithm of Finding Maximal Motifs in DNA Sequences
    Fu Yifan
    Zhou Dongdai
    Zhong Shaochun
    Zhao Ruiqing
    2008 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEM AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2008, : 176 - +
  • [49] Finding subtle motifs with variable gaps in unaligned DNA sequences
    Hu, YJ
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2003, 70 (01) : 11 - 20
  • [50] An efficient algorithm for the identification of structured motifs in DNA promoter sequences
    Carvalho, AM
    Freitas, AT
    Oliveira, AL
    Sagot, MF
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2006, 3 (02) : 126 - 140