Biological Sequence Classification with Multivariate String Kernels

被引:6
|
作者
Kuksa, Pavel P. [1 ]
机构
[1] NEC Labs America Inc, Machine Learning Dept, Princeton, NJ 08540 USA
关键词
Biological sequence classification; kernel methods; PROTEIN HOMOLOGY DETECTION; PEPTIDE BINDING; PREDICTION;
D O I
10.1109/TCBB.2013.15
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on many practical tasks of sequence analysis such as biological sequence classification, remote homology detection, or protein superfamily and fold prediction. However, typical string kernel methods rely on the analysis of discrete 1D string data (e.g., DNA or amino acid sequences). In this paper, we address the multiclass biological sequence classification problems using multivariate representations in the form of sequences of features vectors (as in biological sequence profiles, or sequences of individual amino acid physicochemical descriptors) and a class of multivariate string kernels that exploit these representations. On three protein sequence classification tasks, the proposed multivariate representations and kernels show significant 15-20 percent improvements compared to existing state-of-the-art sequence classification methods.
引用
收藏
页码:1201 / 1210
页数:10
相关论文
共 50 条
  • [41] String kernels for matching seriated graphs
    Yu, Hang
    Hancock, Edwin R.
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS, 2006, : 224 - +
  • [42] LEARNING SEQUENCE KERNELS
    Cortes, Corinna
    Mohri, Mehryar
    Rostamizadeh, Afshin
    2008 IEEE WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2008, : 2 - +
  • [43] Factored sequence kernels
    Cancedda, Nicola
    Mahe, Pierre
    NEUROCOMPUTING, 2009, 72 (7-9) : 1407 - 1413
  • [44] Biological Sequence Classification Using Deep Learning Architectures
    Sivasubramanian, Arrun
    Prashanth, V. R.
    Kumar, S. Sachin
    Soman, K. P.
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, ICICC 2022, VOL 1, 2023, 473 : 529 - 537
  • [45] Biological Sequence Classification: A Review on Data and General Methods
    Ao C.
    Jiao S.
    Wang Y.
    Yu L.
    Zou Q.
    Research, 2022, 2022
  • [46] Biological Sequence Embedding Based Classification for MERS and SARS
    Ganesan, Shamika
    Kumar, S. Sachin
    Soman, K. P.
    ADVANCES IN COMPUTING AND DATA SCIENCES, PT I, 2021, 1440 : 475 - 487
  • [47] Biological sequence classification utilizing positive and unlabeled data
    Xiao, Yuanyuan
    Segal, Mark R.
    BIOINFORMATICS, 2008, 24 (09) : 1198 - 1205
  • [48] Multiple sequence alignment using biological features classification
    Besharati, Arezoo
    Mehrdadjalali
    2014 INTERNATIONAL CONGRESS ON TECHNOLOGY, COMMUNICATION AND KNOWLEDGE (ICTCK), 2014,
  • [49] Biological Sequence Classification: A Review on Data and General Methods
    Ao, Chunyan
    Jiao, Shihu
    Wang, Yansu
    Yu, Liang
    Zou, Quan
    RESEARCH, 2022, 2022
  • [50] Position-aware string kernels with weighted shifts and a general framework to apply string kernels to other structured data
    Shin, Kilho
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2007, 2007, 4881 : 316 - 325