An upper bound on the hardness of exact matrix based motif discovery

被引:2
|
作者
Horton, Paul [1 ]
Fujibuchi, Wataru [1 ]
机构
[1] AIST, Computat Biol Res Ctr, Tokyo, Japan
关键词
Motif discovery; Computational complexity; Combinatorics; Transcription factor binding site prediction; String algorithm;
D O I
10.1016/j.jda.2006.10.006
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Motif discovery is the problem of finding local patterns or motifs from a set of unlabeled sequences. One common representation of a motif is a Markov model known as a score matrix. Matrix based motif discovery has been extensively studied but no positive results have been known regarding its theoretical hardness. We present the first non-trivial upper bound on the complexity (worst-case computation time) of this problem. Other than linear terms, our bound depends only on the motif width w (which is typically 5-20) and is a dramatic improvement relative to previously known bounds. We prove this bound by relating the motif discovery problem to a search problem over permutations of strings of length w, in which the permutations have a particular property. We give a constructive proof of an upper bound on the number of such permutations. For an alphabet size of sigma (typically 4) the trivial bound is n! approximate to (n/e)(n), n = sigma(w). Our bound is roughly n(sigma log(sigma) n)(n). We relate this theoretical result to the exact motif discovery program, TsukubaBB, whose algorithm contains ideas which inspired the result. We describe a recent improvement to the TsukubaBB program which can give a speed up of nine or more and use a dataset of REB1 transcription factor binding sites to illustrate that exact methods can indeed be used in some practical situations. (C) 2006 Published by Elsevier B.V.
引用
收藏
页码:706 / 713
页数:8
相关论文
共 50 条
  • [41] An upper bound for the spectral condition number of a diagonalizable matrix
    Jiang, EX
    Lam, PCB
    LINEAR ALGEBRA AND ITS APPLICATIONS, 1997, 262 : 165 - 178
  • [42] An Enhanced Time Series Motif Discovery Using Approximated Matrix Profile
    Onwongsa, Chanapon
    Ratanamahatana, Chotirat Ann
    PROCEEDINGS OF 2020 2ND INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MACHINE VISION AND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND MACHINE LEARNING, IPMV 2020, 2020, : 180 - 189
  • [43] EXACT UPPER BOUND FOR PION-NUCLEON COUPLING-CONSTANT
    OKUBO, S
    PROGRESS OF THEORETICAL PHYSICS, 1972, 48 (6A): : 1986 - 2007
  • [44] NEW WORST-CASE UPPER BOUND FOR COUNTING EXACT SATISFIABILITY
    Zhou, Junping
    Su, Weihua
    Wang, Jianan
    INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 2014, 25 (06) : 667 - 678
  • [45] Matrix Profile V: A Generic Technique to Incorporate Domain Knowledge into Motif Discovery
    Hoang Anh Dau
    Keogh, Eamonn
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 125 - 134
  • [46] Biogeography-based optimization for motif discovery problem
    Gong, X. (gongxj@tju.edu.cn), 1600, Binary Information Press, P.O. Box 162, Bethel, CT 06801-0162, United States (09):
  • [47] Matrix Profile IX: Admissible Time Series Motif Discovery With Missing Data
    Zhu, Yan
    Mueen, Abdullah
    Keogh, Eamonn
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (06) : 2616 - 2626
  • [48] Upper Bound of Primitive Exponent of a Class of Nonnegative Matrix Pairs
    Luo, Meijin
    Li, Xi
    Tao, Sun
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND MANAGEMENT INNOVATION, 2015, 28 : 34 - 37
  • [49] An upper bound on the Perron value of an almost regular tournament matrix
    Kirkland, S
    LINEAR ALGEBRA AND ITS APPLICATIONS, 2003, 361 : 7 - 22
  • [50] Persistence-Based Motif Discovery in Time Series
    Germain, Thibaut
    Truong, Charles
    Oudre, Laurent
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (11) : 6814 - 6827