An Improved Hashing Approach for Biological Sequence to Solve Exact Pattern Matching Problems

被引:1
|
作者
Mahmud, Prince [1 ]
Rahman, Anisur [1 ]
Hasan Talukder, Kamrul [1 ]
机构
[1] Khulna Univ, Comp Sci & Engn Discipline, Khulna 9208, Bangladesh
关键词
ALGORITHMS;
D O I
10.1155/2023/3278505
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pattern matching algorithms have gained a lot of importance in computer science, primarily because they are used in various domains such as computational biology, video retrieval, intrusion detection systems, and fraud detection. Finding one or more patterns in a given text is known as pattern matching. Two important things that are used to judge how well exact pattern matching algorithms work are the total number of attempts and the character comparisons that are made during the matching process. The primary focus of our proposed method is reducing the size of both components wherever possible. Despite sprinting, hash-based pattern matching algorithms may have hash collisions. The Efficient Hashing Method (EHM) algorithm is improved in this research. Despite the EHM algorithm's effectiveness, it takes a lot of time in the preprocessing phase, and some hash collisions are generated. A novel hashing method has been proposed, which has reduced the preprocessing time and hash collision of the EHM algorithm. We devised the Hashing Approach for Pattern Matching (HAPM) algorithm by taking the best parts of the EHM and Quick Search (QS) algorithms and adding a way to avoid hash collisions. The preprocessing step of this algorithm combines the bad character table from the QS algorithm, the hashing strategy from the EHM algorithm, and the collision-reducing mechanism. To analyze the performance of our HAPM algorithm, we have used three types of datasets: E. coli, DNA sequences, and protein sequences. We looked at six algorithms discussed in the literature and compared our proposed method. The Hash-q with Unique FNG (HqUF) algorithm was only compared with E. coli and DNA datasets because it creates unique bits for DNA sequences. Our proposed HAPM algorithm also overcomes the problems of the HqUF algorithm. The new method beats older ones regarding average runtime, number of attempts, and character comparisons for long and short text patterns, though it did worse on some short patterns.
引用
收藏
页数:16
相关论文
共 28 条
  • [21] A novel competitive exact approach to solve assembly line balancing problems based on lexicographic order of vectors
    Xu, Shifu
    Shavarani, Seyed Mahdi
    Nejad, Mazyar Ghadiri
    Vizvari, Bela
    Toghraie, Davood
    HELIYON, 2023, 9 (03)
  • [22] Incorporating Protein Sequence and Evolutionary Information in a Structural Pattern Matching Approach for Contact Maps
    Ahmed, Hazem Radwan A.
    Glasgow, Janice I.
    BIOTECHNO 2011: THE THIRD INTERNATIONAL CONFERENCE ON BIOINFORMATICS, BIOCOMPUTATIONAL SYSTEMS AND BIOTECHNOLOGIES, 2011, : 49 - 55
  • [23] Pattern matching through Chaos Game Representation: bridging numerical and discrete data structures for biological sequence analysis
    Susana Vinga
    Alexandra M Carvalho
    Alexandre P Francisco
    Luís MS Russo
    Jonas S Almeida
    Algorithms for Molecular Biology, 7
  • [24] Pattern matching through Chaos Game Representation: bridging numerical and discrete data structures for biological sequence analysis
    Vinga, Susana
    Carvalho, Alexandra M.
    Francisco, Alexandre P.
    Russo, Luis M. S.
    Almeida, Jonas S.
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2012, 7
  • [25] Improved differential evolution approach based on cultural algorithm and diversity measure applied to solve economic load dispatch problems
    Coelho, Leandro dos Santos
    Thom Souza, Rodrigo Clemente
    Mariani, Viviana Cocco
    MATHEMATICS AND COMPUTERS IN SIMULATION, 2009, 79 (10) : 3136 - 3147
  • [26] A Sectioning and Database Enrichment Approach for Improved Peptide Spectrum Matching in Large, Genome-Guided Protein Sequence Databases
    Kumar, Praveen
    Johnson, James E.
    Easterly, Caleb
    Mehta, Subina
    Sajulga, Ray
    Nunn, Brook
    Jagtap, Pratik D.
    Griffin, Timothy J.
    JOURNAL OF PROTEOME RESEARCH, 2020, 19 (07) : 2772 - 2785
  • [27] An automatic tracking approach for monitoring moving targets from meteorological satellite image sequence based on point-pattern matching
    Guo, Zhongyang
    Dai, Xiaoyan
    Wu, Jianping
    CISP 2008: FIRST INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOL 4, PROCEEDINGS, 2008, : 150 - 155
  • [28] MULTIPLE SEQUENCE ALIGNMENT OF PROTEIN FAMILIES SHOWING LOW SEQUENCE HOMOLOGY - A METHODOLOGICAL APPROACH USING DATABASE PATTERN-MATCHING DISCRIMINATORS FOR G-PROTEIN-LINKED RECEPTORS
    ATTWOOD, TK
    ELIOPOULOS, EE
    FINDLAY, JBC
    GENE, 1991, 98 (02) : 153 - 159