Efficient Approximation Algorithms for String Kernel Based Sequence Classification

被引:0
|
作者
Farhan, Muhammad [1 ]
Tariq, Juvaria [2 ]
Zaman, Arif [1 ]
Shabbir, Mudassir [3 ]
Khan, Imdad Ullah [1 ]
机构
[1] Lahore Univ Management Sci, Sch Sci & Engn, Dept Comp Sci, Lahore, Pakistan
[2] Lahore Univ Management Sci Lahore, Sch Sci & Engn, Dept Math, Lahore, Pakistan
[3] Informat Technol Univ, Dept Comp Sci, Lahore, Pakistan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sequence classification algorithms, such as SVM, require a definition of distance (similarity) measure between two sequences. A commonly used notion of similarity is the number of matches between k-mers (k-length subsequences) in the two sequences. Extending this definition, by considering two k-mers to match if their distance is at most m, yields better classification performance. This, however, makes the problem computationally much more complex. Known algorithms to compute this similarity have computational complexity that render them applicable only for small values of k and m. In this work, we develop novel techniques to efficiently and accurately estimate the pairwise similarity score, which enables us to use much larger values of k and m, and get higher predictive accuracy. This opens up a broad avenue of applying this classification approach to audio, images, and text sequences. Our algorithm achieves excellent approximation performance with theoretical guarantees. In the process we solve an open combinatorial problem, which was posed as a major hindrance to the scalability of existing solutions. We give analytical bounds on quality and runtime of our algorithm and report its empirical performance on real world biological and music sequences datasets.
引用
下载
收藏
页数:11
相关论文
共 50 条
  • [1] Efficient Approximate Kernel Based Spike Sequence Classification
    Ali, Sarwan
    Sahoo, Bikram
    Khan, Muhammad Asad
    Zelikovsky, Alexander
    Khan, Imdad Ullah
    Patterson, Murray
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (06) : 3376 - 3388
  • [2] A Multi-Fold String Kernel for Sequence Classification
    Maiti, Aniruddha
    Ghorai, Santanu
    Mukherjee, Anirban
    2015 37TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2015, : 6469 - 6472
  • [3] Molecular sequence classification using efficient kernel based embedding
    Ali, Sarwan
    Ali, Tamkanat E.
    Murad, Taslim
    Mansoor, Haris
    Patterson, Murray
    INFORMATION SCIENCES, 2024, 679
  • [4] An approximation of the Gaussian RBF kernel for efficient classification with SVMs
    Ring, Matthias
    Eskofier, Bjoern M.
    PATTERN RECOGNITION LETTERS, 2016, 84 : 107 - 113
  • [5] Efficient sequence classification by R2-Kernel
    Lei, Hansheng
    VISUALIZATION AND DATA ANALYSIS 2008, 2008, 6809
  • [6] Lambda pruning: an approximation of the string subsequence kernel for practical SVM classification and redundancy clustering
    Seewald, Alexander K.
    Kleedorfer, Florian
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2007, 1 (03) : 221 - 239
  • [7] Lambda pruning: an approximation of the string subsequence kernel for practical SVM classification and redundancy clustering
    Alexander K. Seewald
    Florian Kleedorfer
    Advances in Data Analysis and Classification, 2007, 1 : 221 - 239
  • [8] A Unified String Kernel for Biology Sequence
    Yuan, Dehui
    Yang, Shengyun
    Lai, Guoming
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, PROCEEDINGS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2008, 5227 : 633 - 641
  • [9] Efficient randomized tensor-based algorithms for function approximation and low-rank kernel interactions
    Arvind K. Saibaba
    Rachel Minster
    Misha E. Kilmer
    Advances in Computational Mathematics, 2022, 48
  • [10] Efficient randomized tensor-based algorithms for function approximation and low-rank kernel interactions
    Saibaba, Arvind K.
    Minster, Rachel
    Kilmer, Misha E.
    ADVANCES IN COMPUTATIONAL MATHEMATICS, 2022, 48 (05)