Longest common substring in Longest Common Subsequence's solution service: A novel hyper-heuristic

被引：0

作者：

Abdi, Alireza ^{[1
]}

Hajsaeedi, Masih ^{[1
]}

Hooshmand, Mohsen ^{[1
]}

机构：

[1] Inst Adv Studies Basic Sci IASBS, Dept Comp Sci & Informat Technol, Zanjan, Iran

来源：

COMPUTATIONAL BIOLOGY AND CHEMISTRY | 2023年 / 105卷

关键词：

Longest common subsequence; Longest common substring; Hyper-heuristic; Upper bound; BEAM SEARCH;

D O I：

10.1016/j.compbiolchem.2023.107882

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

The Longest Common Subsequence (LCS) is the problem of finding a subsequence among a set of strings that has two properties of being common to all and the longest. The LCS has applications in computational biology and text editing, among many others. Due to the NP-hardness of the general longest common subsequence, numerous heuristic algorithms and solvers have been proposed to give the best possible solution for different sets of strings. None of them has the best performance for all types of sets. In addition, there is no method to specify the type of a given set of strings. Besides that, the available hyper-heuristic is not efficient and fast enough to solve this problem in real-world applications. This paper proposes a novel hyper-heuristic to solve the longest common subsequence problem using a new criterion to classify a set of strings based on their similarity. To do this, we offer a general stochastic framework to identify the type of a given set of strings. Following that, we introduce the set similarity dichotomizer (S2D) algorithm based on the framework that divides the type of sets into two. This algorithm is introduced for the first time in this paper and opens a new way to go beyond the current LCS solvers. Then, we present our proposed hyper-heuristic that exploits the S2D and one of the internal properties of the given strings to choose the best matching heuristic among a set of heuristics. We compare the results on benchmark datasets with the best heuristics and hyper-heuristics. The results show that our proposed dichotomizer (i.e., S2D) can classify datasets with 98% of accuracy. Also, our proposed hyper-heuristic obtains competitive performance in comparison with the best methods and outperforms best hyper-heuristics for uncorrelated datasets in terms of both quality of solutions and run time factors. All supplementary files, including the source codes and datasets, are publicly available on GitHub.1

引用

页数：11

共 50 条

[41] Longest Common Substring Made Fully Dynamic
Amir, Amihood
Charalampopoulos, Panagiotis
Pissis, Solon P.
Radoszewski, Jakub
27TH ANNUAL EUROPEAN SYMPOSIUM ON ALGORITHMS (ESA 2019), 2019, 144
[42] Computing the longest common substring with one mismatch
Babenko, M. A.
Starikovskaya, T. A.
PROBLEMS OF INFORMATION TRANSMISSION, 2011, 47 (01) : 28 - 33
[43] A new algorithm for the longest common subsequence problem
Xiang, Xuyu
Zhang, Dafang
Qin, Jiaohua
CIS WORKSHOPS 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY WORKSHOPS, 2007, : 112 - 115
[44] On the generalized constrained longest common subsequence problems
Chen, Yi-Ching
Chao, Kun-Mao
JOURNAL OF COMBINATORIAL OPTIMIZATION, 2011, 21 (03) : 383 - 392
[45] PARALLEL COMPUTATION OF LONGEST-COMMON-SUBSEQUENCE
LU, M
LECTURE NOTES IN COMPUTER SCIENCE, 1990, 468 : 383 - 394
[46] Repetition-free longest common subsequence
Adi, Said S.
Braga, Marilia D. V.
Fernandes, Cristina G.
Ferreira, Carlos E.
Martinez, Fabio Viduani
Sagot, Marie-France
Stefanes, Marco A.
Tjandraatmadja, Christian
Wakabayashi, Yoshiko
DISCRETE APPLIED MATHEMATICS, 2010, 158 (12) : 1315 - 1324
[47] Longest Common Subsequence in k Length Substrings
Benson, Gary
Levy, Avivit
Shalom, B. Riva
SIMILARITY SEARCH AND APPLICATIONS (SISAP), 2013, 8199 : 257 - 265
[48] Parallel algorithms for the longest common subsequence problem
Babu, KN
Saxena, S
FOURTH INTERNATIONAL CONFERENCE ON HIGH-PERFORMANCE COMPUTING, PROCEEDINGS, 1997, : 120 - 125
[49] A Simple Algorithm for Solving for the Generalized Longest Common Subsequence (LCS) Problem with a Substring Exclusion Constraint
Zhu, Daxin
Wang, Xiaodong
ALGORITHMS, 2013, 6 (03) : 485 - 493
[50] An Efficient Method for Time Series Join on Subsequence Correlation Using Longest Common Substring Algorithm
Vo Duc Vinh
Nguyen Phuc Chau
Duong Tuan Anh
CONTEXT-AWARE SYSTEMS AND APPLICATIONS (ICCASA 2016), 2017, 193 : 121 - 131

← 1 2 3 4 5 →