An optimal algorithm for maximum-sum segment and its application in bioinformatics

被引:0
|
作者
Fan, TH
Lee, SF
Lu, HI [1 ]
Tsou, TS
Wang, TC
Yao, A
机构
[1] Acad Sinica, Inst Informat Sci, Taipei 115, Taiwan
[2] Natl Cent Univ, Inst Stat, Chungli 320, Taiwan
[3] Acad Sinica, Inst Biomed Sci, Taipei 115, Taiwan
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We study a fundamental sequence algorithm arising from bioinformatics. Given two integers L and U and a sequence A of n numbers, the maximum-sum segment problem is to find a segment A[i, j] of A with L less than or equal to j-i+1 less than or equal to U that maximizes A[i]+A[i+1]+(...)+A[j]. The problem finds applications in finding repeats, designing low complexity filter, and locating segments with rich C+G content for biomolecular sequences. The best known algorithm, due to Lin, Jiang, and Chao, runs in 0(n) time, based upon a clever technique called left-negative decomposition for A. In the present paper, we present a new O(n)-time algorithm that bypasses the left-negative decomposition. As a result, our algorithm has the capability to handle the input sequence in an online manner, which is clearly an important feature to cope with genome-scale sequences. We also show how to exploit the sparsity in the input sequence: If A is representable in O(k) spare in some format, then our algorithm runs in O(k) time. Moreover, practical implementation of our algorithm running on the rice genome helps us to identify a very long repeat structure in rice chromosome 1 that is previously unknown.
引用
收藏
页码:251 / 257
页数:7
相关论文
共 50 条