An upper bound on the hardness of exact matrix based motif discovery

被引:2
|
作者
Horton, Paul [1 ]
Fujibuchi, Wataru [1 ]
机构
[1] AIST, Computat Biol Res Ctr, Tokyo, Japan
关键词
Motif discovery; Computational complexity; Combinatorics; Transcription factor binding site prediction; String algorithm;
D O I
10.1016/j.jda.2006.10.006
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Motif discovery is the problem of finding local patterns or motifs from a set of unlabeled sequences. One common representation of a motif is a Markov model known as a score matrix. Matrix based motif discovery has been extensively studied but no positive results have been known regarding its theoretical hardness. We present the first non-trivial upper bound on the complexity (worst-case computation time) of this problem. Other than linear terms, our bound depends only on the motif width w (which is typically 5-20) and is a dramatic improvement relative to previously known bounds. We prove this bound by relating the motif discovery problem to a search problem over permutations of strings of length w, in which the permutations have a particular property. We give a constructive proof of an upper bound on the number of such permutations. For an alphabet size of sigma (typically 4) the trivial bound is n! approximate to (n/e)(n), n = sigma(w). Our bound is roughly n(sigma log(sigma) n)(n). We relate this theoretical result to the exact motif discovery program, TsukubaBB, whose algorithm contains ideas which inspired the result. We describe a recent improvement to the TsukubaBB program which can give a speed up of nine or more and use a dataset of REB1 transcription factor binding sites to illustrate that exact methods can indeed be used in some practical situations. (C) 2006 Published by Elsevier B.V.
引用
收藏
页码:706 / 713
页数:8
相关论文
共 50 条
  • [1] An upper bound on the hardness of exact matrix based motif discovery
    Horton, P
    Fujibuchi, W
    COMBINATORIAL PATTERN MATCHING, PROCEEDINGS, 2005, 3537 : 219 - 228
  • [2] Efficient exact motif discovery
    Marschall, Tobias
    Rahmann, Sven
    BIOINFORMATICS, 2009, 25 (12) : I356 - I364
  • [3] Parallel Exact Time Series Motif Discovery
    Narang, Ankur
    Bhattacherjee, Souvik
    EURO-PAR 2010 - PARALLEL PROCESSING, PART II, 2010, 6272 : 304 - 315
  • [4] Quick-Motif: An Efficient and Scalable Framework for Exact Motif Discovery
    Li, Yuhong
    Hou, Leong U.
    Yiu, Man Lung
    Gong, Zhiguo
    2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2015, : 579 - 590
  • [5] Multidimensional time series motif group discovery based on matrix profile
    Cao, Danyang
    Lin, Zifeng
    KNOWLEDGE-BASED SYSTEMS, 2024, 304
  • [6] Exact upper bound for sorting Rn with LE
    Kuppili, Sai Satwik
    Chitturi, Bhadrachalam
    DISCRETE MATHEMATICS ALGORITHMS AND APPLICATIONS, 2020, 12 (03)
  • [7] Efficient automatic exact motif discovery algorithms for biological sequences
    Karci, Ali
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) : 7952 - 7963
  • [8] An upper bound for the permanent of a nonnegative matrix
    Hwang, SG
    Krauter, AR
    Michael, TS
    LINEAR ALGEBRA AND ITS APPLICATIONS, 1998, 281 (1-3) : 259 - 263
  • [9] A new upper bound for the green matrix
    Nechepurenko, Yu.M.
    Doklady Akademii Nauk, 2001, 378 (04) : 450 - 452
  • [10] Matrix Profile VI: Meaningful Multidimensional Motif Discovery
    Yeh, Chin-Chia Michael
    Kavantzas, Nickolas
    Keogh, Eamonn
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 565 - 574