Longest common substring in Longest Common Subsequence's solution service: A novel hyper-heuristic

被引:0
|
作者
Abdi, Alireza [1 ]
Hajsaeedi, Masih [1 ]
Hooshmand, Mohsen [1 ]
机构
[1] Inst Adv Studies Basic Sci IASBS, Dept Comp Sci & Informat Technol, Zanjan, Iran
关键词
Longest common subsequence; Longest common substring; Hyper-heuristic; Upper bound; BEAM SEARCH;
D O I
10.1016/j.compbiolchem.2023.107882
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The Longest Common Subsequence (LCS) is the problem of finding a subsequence among a set of strings that has two properties of being common to all and the longest. The LCS has applications in computational biology and text editing, among many others. Due to the NP-hardness of the general longest common subsequence, numerous heuristic algorithms and solvers have been proposed to give the best possible solution for different sets of strings. None of them has the best performance for all types of sets. In addition, there is no method to specify the type of a given set of strings. Besides that, the available hyper-heuristic is not efficient and fast enough to solve this problem in real-world applications. This paper proposes a novel hyper-heuristic to solve the longest common subsequence problem using a new criterion to classify a set of strings based on their similarity. To do this, we offer a general stochastic framework to identify the type of a given set of strings. Following that, we introduce the set similarity dichotomizer (S2D) algorithm based on the framework that divides the type of sets into two. This algorithm is introduced for the first time in this paper and opens a new way to go beyond the current LCS solvers. Then, we present our proposed hyper-heuristic that exploits the S2D and one of the internal properties of the given strings to choose the best matching heuristic among a set of heuristics. We compare the results on benchmark datasets with the best heuristics and hyper-heuristics. The results show that our proposed dichotomizer (i.e., S2D) can classify datasets with 98% of accuracy. Also, our proposed hyper-heuristic obtains competitive performance in comparison with the best methods and outperforms best hyper-heuristics for uncorrelated datasets in terms of both quality of solutions and run time factors. All supplementary files, including the source codes and datasets, are publicly available on GitHub.1
引用
收藏
页数:11
相关论文
共 50 条
  • [1] A hyper-heuristic for the Longest Common Subsequence problem
    Tabataba, Farzaneh Sadat
    Mousavi, Sayyed Rasoul
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2012, 36 : 42 - 54
  • [2] Efficient Computation for the Longest Common Subsequence with Substring Inclusion and Subsequence Exclusion Constraints
    Wang, Xiaodong
    Zhu, Daxin
    SMART COMPUTING AND COMMUNICATION, SMARTCOM 2016, 2017, 10135 : 419 - 428
  • [3] Efficient algorithms for the longest common subsequence problem with sequential substring constraints
    Tseng, Chiou-Ting
    Yang, Chang-Biau
    Ann, Hsing-Yen
    JOURNAL OF COMPLEXITY, 2013, 29 (01) : 44 - 52
  • [4] Heuristic algorithms for the Longest Filled Common Subsequence Problem
    Mincu, Radu Stefan
    Popa, Alexandru
    2018 20TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2018), 2019, : 449 - 453
  • [5] The longest common substring problem
    Crochemore, Maxime
    Iliopoulos, Costas S.
    Langiu, Alessio
    Mignosi, Filippo
    MATHEMATICAL STRUCTURES IN COMPUTER SCIENCE, 2017, 27 (02) : 277 - 295
  • [6] Computing Longest Common Substring/Subsequence of Non-linear Texts
    Shimohira, Kouji
    Inenaga, Shunsuke
    Bannai, Hideo
    Takeda, Masayuki
    PROCEEDINGS OF THE PRAGUE STRINGOLOGY CONFERENCE 2011, 2011, : 197 - 208
  • [7] Cyclic longest common subsequence
    Naiman, Aaron E.
    Farber, Eliav
    Stein, Yossi
    DISCRETE MATHEMATICS ALGORITHMS AND APPLICATIONS, 2023, 15 (04)
  • [8] Exemplar longest common subsequence
    Bonizzoni, Paola
    Della Vedova, Gianluca
    Dondi, Riccardo
    Fertin, Guillaume
    Rizzi, Raffaella
    Vialette, Stephane
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2007, 4 (04) : 535 - 543
  • [9] On the longest common parameterized subsequence
    Keller, Orgad
    Kopelowitz, Tsvi
    Lewenstein, Moshe
    COMBINATORIAL PATTERN MATCHING, 2008, 5029 : 303 - +
  • [10] On the longest common parameterized subsequence
    Keller, Orgad
    Kopelowitz, Tsvi
    Lewenstein, Moshe
    THEORETICAL COMPUTER SCIENCE, 2009, 410 (51) : 5347 - 5353