Biological Sequence Classification with Multivariate String Kernels

被引:6
|
作者
Kuksa, Pavel P. [1 ]
机构
[1] NEC Labs America Inc, Machine Learning Dept, Princeton, NJ 08540 USA
关键词
Biological sequence classification; kernel methods; PROTEIN HOMOLOGY DETECTION; PEPTIDE BINDING; PREDICTION;
D O I
10.1109/TCBB.2013.15
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on many practical tasks of sequence analysis such as biological sequence classification, remote homology detection, or protein superfamily and fold prediction. However, typical string kernel methods rely on the analysis of discrete 1D string data (e.g., DNA or amino acid sequences). In this paper, we address the multiclass biological sequence classification problems using multivariate representations in the form of sequences of features vectors (as in biological sequence profiles, or sequences of individual amino acid physicochemical descriptors) and a class of multivariate string kernels that exploit these representations. On three protein sequence classification tasks, the proposed multivariate representations and kernels show significant 15-20 percent improvements compared to existing state-of-the-art sequence classification methods.
引用
收藏
页码:1201 / 1210
页数:10
相关论文
共 50 条
  • [21] Combining Static and Dynamic Features for Multivariate Sequence Classification
    Leontjeva, Anna
    Kuzovkin, Ilya
    PROCEEDINGS OF 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS, (DSAA 2016), 2016, : 21 - 30
  • [22] Hyperspectral imaging coupled with multivariate analysis and artificial intelligence to the classification of maize kernels
    Alimohammadi, Fariba
    Rasekh, Mansour
    Sayyah, Amir Hosein Afkari
    Abbaspour-Gilandeh, Yousef
    Karami, Hamed
    Sharabiani, Vali Rasooli
    Fioravanti, Ambra
    Gancarz, Marek
    Findura, Pavol
    Kwasniewski, Dariusz
    INTERNATIONAL AGROPHYSICS, 2022, 36 (02) : 83 - 91
  • [23] Analysis of string-searching algorithms on biological sequence databases
    Sheik, SS
    Aggarwal, SK
    Poddar, A
    Sathiyabhama, B
    Balakrishnan, N
    Sekar, K
    CURRENT SCIENCE, 2005, 89 (02): : 368 - 374
  • [24] Efficient Approximation Algorithms for String Kernel Based Sequence Classification
    Farhan, Muhammad
    Tariq, Juvaria
    Zaman, Arif
    Shabbir, Mudassir
    Khan, Imdad Ullah
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [25] HASKER: An efficient algorithm for string kernels. Application to polarity classification in various languages
    Popescu, Marius
    Grozea, Cristian
    Ionescu, Radu Tudor
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 1755 - 1763
  • [26] Tutorial: multivariate classification for vibrational spectroscopy in biological samples
    Camilo L. M. Morais
    Kássio M. G. Lima
    Maneesh Singh
    Francis L. Martin
    Nature Protocols, 2020, 15 : 2143 - 2162
  • [27] Tutorial: multivariate classification for vibrational spectroscopy in biological samples
    Morais, Camilo L. M.
    Lima, Kassio M. G.
    Singh, Maneesh
    Martin, Francis L.
    NATURE PROTOCOLS, 2020, 15 (07) : 2143 - 2162
  • [28] Learning interpretable SVMs for biological sequence classification
    Rätsch, G
    Sonnenburg, S
    Schäfer, C
    BMC BIOINFORMATICS, 2006, 7 (Suppl 1)
  • [29] Segment and combine approach for biological sequence classification
    Geurts, P
    Cuesta, AB
    Wehenkel, L
    PROCEEDINGS OF THE 2005 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2005, : 194 - 201
  • [30] Learning interpretable SVMs for biological sequence classification
    Sonnenburg, S
    Rätsch, G
    Schäfer, C
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, PROCEEDINGS, 2005, 3500 : 389 - 407