An Improved Hashing Approach for Biological Sequence to Solve Exact Pattern Matching Problems

被引:1
|
作者
Mahmud, Prince [1 ]
Rahman, Anisur [1 ]
Hasan Talukder, Kamrul [1 ]
机构
[1] Khulna Univ, Comp Sci & Engn Discipline, Khulna 9208, Bangladesh
关键词
ALGORITHMS;
D O I
10.1155/2023/3278505
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pattern matching algorithms have gained a lot of importance in computer science, primarily because they are used in various domains such as computational biology, video retrieval, intrusion detection systems, and fraud detection. Finding one or more patterns in a given text is known as pattern matching. Two important things that are used to judge how well exact pattern matching algorithms work are the total number of attempts and the character comparisons that are made during the matching process. The primary focus of our proposed method is reducing the size of both components wherever possible. Despite sprinting, hash-based pattern matching algorithms may have hash collisions. The Efficient Hashing Method (EHM) algorithm is improved in this research. Despite the EHM algorithm's effectiveness, it takes a lot of time in the preprocessing phase, and some hash collisions are generated. A novel hashing method has been proposed, which has reduced the preprocessing time and hash collision of the EHM algorithm. We devised the Hashing Approach for Pattern Matching (HAPM) algorithm by taking the best parts of the EHM and Quick Search (QS) algorithms and adding a way to avoid hash collisions. The preprocessing step of this algorithm combines the bad character table from the QS algorithm, the hashing strategy from the EHM algorithm, and the collision-reducing mechanism. To analyze the performance of our HAPM algorithm, we have used three types of datasets: E. coli, DNA sequences, and protein sequences. We looked at six algorithms discussed in the literature and compared our proposed method. The Hash-q with Unique FNG (HqUF) algorithm was only compared with E. coli and DNA datasets because it creates unique bits for DNA sequences. Our proposed HAPM algorithm also overcomes the problems of the HqUF algorithm. The new method beats older ones regarding average runtime, number of attempts, and character comparisons for long and short text patterns, though it did worse on some short patterns.
引用
收藏
页数:16
相关论文
共 28 条
  • [1] Variable-length hashing for exact pattern matching
    Pnevmatikatos, Dionisios
    Arelakis, Aggelos
    2006 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS, 2006, : 405 - 410
  • [2] A fast exact pattern matching algorithm for biological sequences
    Huang, Yong
    Ping, Lingdi
    Pan, Xuezeng
    Cai, Guoyong
    BMEI 2008: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS, VOL 1, 2008, : 8 - +
  • [3] TVSBS: A fast exact pattern matching algorithm for biological sequences
    Thathoo, Rahul
    Virmani, Ashish
    Lakshmi, S. Sai
    Balakrishnan, N.
    Sekar, K.
    CURRENT SCIENCE, 2006, 91 (01): : 47 - 53
  • [4] A Fast Improved Pattern Matching Algorithm for Biological Sequences
    Huang, Yong
    Ping, Lingdi
    Pan, Xuezeng
    Jiang, Li
    Jiang, Xiaoning
    PROCEEDINGS OF THE 2008 INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN, VOL 2, 2008, : 375 - 378
  • [5] Quantum-based exact pattern matching algorithms for biological sequences
    Soni, Kapil Kumar
    Rasool, Akhtar
    ETRI JOURNAL, 2021, 43 (03) : 483 - 510
  • [6] An exact minimax penalty function approach to solve multitime variational problems
    Jayswal, Anurag
    Preeti
    RAIRO-OPERATIONS RESEARCH, 2020, 54 (03) : 637 - 652
  • [7] IDPM: An Improved Degenerate Pattern Matching Algorithm for Biological Sequences
    Lin, Jie
    Jiang, Yue
    Harner, E. James
    Jiang, Bing-Hua
    Adjeroh, Don
    INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 2017, 28 (07) : 889 - 914
  • [8] Effective approach for pattern synthesis of sparse reconfigurable antenna arrays with exact pattern matching
    Shen, Hai Ou
    Wang, Bu Hong
    Li, Long Jun
    IET MICROWAVES ANTENNAS & PROPAGATION, 2016, 10 (07) : 748 - 755
  • [9] The exact multiple pattern matching problem solved by a reference tree approach
    Shieh, Yi-Kung
    Shyu, Shyong Jian
    Lu, Chin Lung
    Lee, Richard Chia-Tung
    THEORETICAL COMPUTER SCIENCE, 2021, 882 : 29 - 48
  • [10] A novel neural network approach to solve exact and inexact graph isomorphism problems
    Jain, BJ
    Wysotzki, F
    ARTIFICAIL NEURAL NETWORKS AND NEURAL INFORMATION PROCESSING - ICAN/ICONIP 2003, 2003, 2714 : 299 - 306