Biological Sequence Classification with Multivariate String Kernels

被引:6
|
作者
Kuksa, Pavel P. [1 ]
机构
[1] NEC Labs America Inc, Machine Learning Dept, Princeton, NJ 08540 USA
关键词
Biological sequence classification; kernel methods; PROTEIN HOMOLOGY DETECTION; PEPTIDE BINDING; PREDICTION;
D O I
10.1109/TCBB.2013.15
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on many practical tasks of sequence analysis such as biological sequence classification, remote homology detection, or protein superfamily and fold prediction. However, typical string kernel methods rely on the analysis of discrete 1D string data (e.g., DNA or amino acid sequences). In this paper, we address the multiclass biological sequence classification problems using multivariate representations in the form of sequences of features vectors (as in biological sequence profiles, or sequences of individual amino acid physicochemical descriptors) and a class of multivariate string kernels that exploit these representations. On three protein sequence classification tasks, the proposed multivariate representations and kernels show significant 15-20 percent improvements compared to existing state-of-the-art sequence classification methods.
引用
收藏
页码:1201 / 1210
页数:10
相关论文
共 50 条
  • [1] Woven String Kernels for DNA Sequence Classification
    McEachern, Andrew
    Ashlock, Daniel
    2013 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2013, : 1578 - 1585
  • [2] Accuracy of string kernels for protein sequence classification
    Spalding, JD
    Hoyle, DC
    PATTERN RECOGNITION AND DATA MINING, PT 1, PROCEEDINGS, 2005, 3686 : 454 - 460
  • [3] Evolving Fisher Kernels for Biological Sequence Classification
    Won, K. -J.
    Saunders, C.
    Pruegel-Bennett, A.
    EVOLUTIONARY COMPUTATION, 2013, 21 (01) : 83 - 105
  • [4] Length-weighted string kernels for sequence data classification
    Tian, Shengfeng
    Mu, Shaomin
    Yin, Chuanhuan
    PATTERN RECOGNITION LETTERS, 2007, 28 (13) : 1651 - 1656
  • [5] Text classification using string kernels
    Lodhi, H
    Shawe-Taylor, J
    Cristianini, N
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 563 - 569
  • [6] Text classification using string kernels
    Lodhi, H
    Saunders, C
    Shawe-Taylor, J
    Cristianini, N
    Watkins, C
    JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (03) : 419 - 444
  • [7] Video event classification using string kernels
    Lamberto Ballan
    Marco Bertini
    Alberto Del Bimbo
    Giuseppe Serra
    Multimedia Tools and Applications, 2010, 48 : 69 - 87
  • [8] Mismatch string kernels for discriminative protein classification
    Leslie, CS
    Eskin, E
    Cohen, A
    Weston, J
    Noble, WS
    BIOINFORMATICS, 2004, 20 (04) : 467 - 476
  • [9] Video event classification using string kernels
    Ballan, Lamberto
    Bertini, Marco
    Del Bimbo, Alberto
    Serra, Giuseppe
    MULTIMEDIA TOOLS AND APPLICATIONS, 2010, 48 (01) : 69 - 87
  • [10] FastSK: fast sequence analysis with gapped string kernels
    Blakely, Derrick
    Collins, Eamon
    Singh, Ritambhara
    Norton, Andrew
    Lanchantin, Jack
    Qi, Yanjun
    BIOINFORMATICS, 2020, 36 : I857 - I865