Efficient Approximation Algorithms for String Kernel Based Sequence Classification

被引:0
|
作者
Farhan, Muhammad [1 ]
Tariq, Juvaria [2 ]
Zaman, Arif [1 ]
Shabbir, Mudassir [3 ]
Khan, Imdad Ullah [1 ]
机构
[1] Lahore Univ Management Sci, Sch Sci & Engn, Dept Comp Sci, Lahore, Pakistan
[2] Lahore Univ Management Sci Lahore, Sch Sci & Engn, Dept Math, Lahore, Pakistan
[3] Informat Technol Univ, Dept Comp Sci, Lahore, Pakistan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sequence classification algorithms, such as SVM, require a definition of distance (similarity) measure between two sequences. A commonly used notion of similarity is the number of matches between k-mers (k-length subsequences) in the two sequences. Extending this definition, by considering two k-mers to match if their distance is at most m, yields better classification performance. This, however, makes the problem computationally much more complex. Known algorithms to compute this similarity have computational complexity that render them applicable only for small values of k and m. In this work, we develop novel techniques to efficiently and accurately estimate the pairwise similarity score, which enables us to use much larger values of k and m, and get higher predictive accuracy. This opens up a broad avenue of applying this classification approach to audio, images, and text sequences. Our algorithm achieves excellent approximation performance with theoretical guarantees. In the process we solve an open combinatorial problem, which was posed as a major hindrance to the scalability of existing solutions. We give analytical bounds on quality and runtime of our algorithm and report its empirical performance on real world biological and music sequences datasets.
引用
下载
收藏
页数:11
相关论文
共 50 条
  • [31] Efficient Reduced Basis Algorithm (ERBA) for Kernel-Based Approximation
    Marchetti, Francesco
    Perracchione, Emma
    JOURNAL OF SCIENTIFIC COMPUTING, 2022, 91 (02)
  • [32] Efficient Reduced Basis Algorithm (ERBA) for Kernel-Based Approximation
    Francesco Marchetti
    Emma Perracchione
    Journal of Scientific Computing, 2022, 91
  • [33] Efficient Algorithms for the Closest String and Distinguishing String Selection Problems
    Wang, Lusheng
    Zhu, Binhai
    FRONTIERS IN ALGORITHMICS, PROCEEDINGS, 2009, 5598 : 261 - +
  • [34] Efficient Algorithms for Kernel Aggregation Queries
    Chan, Tsz Nam
    Hou, Leong U.
    Cheng, Reynold
    Yiu, Man Lung
    Mittal, Shivansh
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (06) : 2726 - 2739
  • [35] An Efficient Classification of Fuzzy XML Documents Based on Kernel ELM
    Zhao, Zhen
    Ma, Zongmin
    Yan, Li
    INFORMATION SYSTEMS FRONTIERS, 2021, 23 (03) : 515 - 530
  • [36] An Efficient Classification of Fuzzy XML Documents Based on Kernel ELM
    Zhen Zhao
    Zongmin Ma
    Li Yan
    Information Systems Frontiers, 2021, 23 : 515 - 530
  • [37] Approximation algorithms for multiple sequence alignment
    Bafna, V
    Lawler, EL
    Pevzner, PA
    THEORETICAL COMPUTER SCIENCE, 1997, 182 (1-2) : 233 - 244
  • [38] Linear and efficient string matching algorithms based on weak factor recognition
    Cantone D.
    Faro S.
    Pavone A.
    ACM Journal of Experimental Algorithmics, 2019, 24 (01):
  • [39] Implicit motif distribution based hybrid computational kernel for sequence classification
    Atalay, V
    Cetin-Atalay, R
    BIOINFORMATICS, 2005, 21 (08) : 1429 - 1436
  • [40] Efficient Approximation Algorithms for Some NP-hard Problems of Partitioning a Set and a Sequence
    Kel'manov, Alexander
    2017 INTERNATIONAL MULTI-CONFERENCE ON ENGINEERING, COMPUTER AND INFORMATION SCIENCES (SIBIRCON), 2017, : 87 - 90