An upper bound on the hardness of exact matrix based motif discovery

被引:2
|
作者
Horton, Paul [1 ]
Fujibuchi, Wataru [1 ]
机构
[1] AIST, Computat Biol Res Ctr, Tokyo, Japan
关键词
Motif discovery; Computational complexity; Combinatorics; Transcription factor binding site prediction; String algorithm;
D O I
10.1016/j.jda.2006.10.006
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Motif discovery is the problem of finding local patterns or motifs from a set of unlabeled sequences. One common representation of a motif is a Markov model known as a score matrix. Matrix based motif discovery has been extensively studied but no positive results have been known regarding its theoretical hardness. We present the first non-trivial upper bound on the complexity (worst-case computation time) of this problem. Other than linear terms, our bound depends only on the motif width w (which is typically 5-20) and is a dramatic improvement relative to previously known bounds. We prove this bound by relating the motif discovery problem to a search problem over permutations of strings of length w, in which the permutations have a particular property. We give a constructive proof of an upper bound on the number of such permutations. For an alphabet size of sigma (typically 4) the trivial bound is n! approximate to (n/e)(n), n = sigma(w). Our bound is roughly n(sigma log(sigma) n)(n). We relate this theoretical result to the exact motif discovery program, TsukubaBB, whose algorithm contains ideas which inspired the result. We describe a recent improvement to the TsukubaBB program which can give a speed up of nine or more and use a dataset of REB1 transcription factor binding sites to illustrate that exact methods can indeed be used in some practical situations. (C) 2006 Published by Elsevier B.V.
引用
收藏
页码:706 / 713
页数:8
相关论文
共 50 条