Compressed pattern matching in DNA sequence

被引:0
|
作者
Chen, L [1 ]
Lu, SY [1 ]
Ram, J [1 ]
机构
[1] Wayne State Univ, Detroit, MI 48202 USA
关键词
D O I
暂无
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
We propose derivative Boyer-Moore (d-BM), a new compressed pattern matching algorithm in DNA sequences. This algorithm is based on the Boyer-Moore method, which is one of the most popular string matching algorithms. In this approach, we compress both DNA sequences and patterns by using two hits to represent each A, T, C, G character. Experiments indicate that this compressed pattern matching algorithm searches long DNA patterns (length > 50) more than 10 times faster than the exact match routine of the software package Agrep, which is known as the fastest pattern matching tool. Moreover, compression of DNA sequences by this method gives a guaranteed space saving of 75%. In part the enhanced speed of the algorithm is due to the increased efficiency of the Boyer-Moore method resulting from an increase in alphabet size from 4 to 256
引用
收藏
页码:62 / 68
页数:7
相关论文
共 50 条
  • [1] Compressed Pattern Matching in Dna Sequences
    Kanchana, N.
    Sarala, S.
    [J]. PROCEEDINGS OF 2010 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, VOL 9 (ICCSIT 2010), 2010, : 157 - 160
  • [2] Approximate Pattern Matching for DNA Sequence Data
    Patil, Nagamma
    Toshniwal, Durga
    Garg, Kumkum
    [J]. COMPUTER NETWORKS AND INFORMATION TECHNOLOGIES, 2011, 142 : 212 - 218
  • [3] Compressed Pattern Matching in DNA Sequences Using Multithreaded Technology
    Lin, Piyuan
    Liu, Shaopeng
    Zhang, Lixia
    Huang, Peijie
    [J]. 2009 3RD INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1-11, 2009, : 165 - 168
  • [4] An Efficient DNA Sequence Compression using Small Sequence Pattern Matching
    Murugan, A.
    Punitha, K.
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (08): : 281 - 287
  • [5] Compressed pattern matching for SEQUITUR
    Mitarai, S
    Hirao, M
    Matsumoto, T
    Shinohara, A
    Takeda, M
    Arikawa, S
    [J]. DCC 2001: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2001, : 469 - 478
  • [6] Compressed Parameterized Pattern Matching
    Beal, Richard
    Adjeroh, Donald A.
    [J]. 2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 461 - 470
  • [7] Compressed Consecutive Pattern Matching
    Gawrychowski, Pawel
    Gourdel, Garance
    Starikovskaya, Tatiana
    Steiner, Teresa Anna
    [J]. 2024 DATA COMPRESSION CONFERENCE, DCC, 2024, : 163 - 172
  • [8] Compressed parameterized pattern matching
    Beal, Richard
    Adjeroh, Donald
    [J]. THEORETICAL COMPUTER SCIENCE, 2016, 609 : 129 - 142
  • [9] Pattern matching in LZW compressed files
    Tao, T
    Mukherjee, A
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2005, 54 (08) : 929 - 938
  • [10] Direct pattern matching on compressed text
    de Moura, ES
    Navarro, G
    Ziviani, N
    Baeza-Yates, R
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL - PROCEEDINGS: A SOUTH AMERICAN SYMPOSIUM, 1998, : 90 - 95