Biological Sequence Classification with Multivariate String Kernels

被引:6
|
作者
Kuksa, Pavel P. [1 ]
机构
[1] NEC Labs America Inc, Machine Learning Dept, Princeton, NJ 08540 USA
关键词
Biological sequence classification; kernel methods; PROTEIN HOMOLOGY DETECTION; PEPTIDE BINDING; PREDICTION;
D O I
10.1109/TCBB.2013.15
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. They often exhibit state-of-the-art performance on many practical tasks of sequence analysis such as biological sequence classification, remote homology detection, or protein superfamily and fold prediction. However, typical string kernel methods rely on the analysis of discrete 1D string data (e.g., DNA or amino acid sequences). In this paper, we address the multiclass biological sequence classification problems using multivariate representations in the form of sequences of features vectors (as in biological sequence profiles, or sequences of individual amino acid physicochemical descriptors) and a class of multivariate string kernels that exploit these representations. On three protein sequence classification tasks, the proposed multivariate representations and kernels show significant 15-20 percent improvements compared to existing state-of-the-art sequence classification methods.
引用
收藏
页码:1201 / 1210
页数:10
相关论文
共 50 条
  • [31] Learning Interpretable SVMs for Biological Sequence Classification
    Gunnar Rätsch
    Sören Sonnenburg
    Christin Schäfer
    BMC Bioinformatics, 7
  • [32] Sparse Model-Space Learning for Multivariate Sequence Classification
    Chen, Ao
    Zhou, Xiren
    Li, Huaijun
    Zhao, Danyang
    Chen, Huanhuan
    2024 10TH INTERNATIONAL CONFERENCE ON BIG DATA AND INFORMATION ANALYTICS, BIGDIA 2024, 2024, : 105 - 111
  • [33] Potential theory with multivariate kernels
    Dmitriy Bilyk
    Damir Ferizović
    Alexey Glazyrin
    Ryan W. Matzke
    Josiah Park
    Oleksandr Vlasiuk
    Mathematische Zeitschrift, 2022, 301 : 2907 - 2935
  • [34] Nonparametric Construction of Multivariate Kernels
    Panaretos, Victor M.
    Konis, Kjell
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2012, 107 (499) : 1085 - 1095
  • [35] Potential theory with multivariate kernels
    Bilyk, Dmitriy
    Ferizovic, Damir
    Glazyrin, Alexey
    Matzke, Ryan W.
    Park, Josiah
    Vlasiuk, Oleksandr
    MATHEMATISCHE ZEITSCHRIFT, 2022, 301 (03) : 2907 - 2935
  • [36] Text clustering with string kernels in R
    Karatzoglou, Alexandros
    Feinerer, Ingo
    ADVANCES IN DATA ANALYSIS, 2007, : 91 - +
  • [37] Masquerade Detection Using String Kernels
    Yang, Min
    Zhang, Huanguo
    Cai, H. J.
    2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15, 2007, : 3681 - +
  • [38] Fast kernels for inexact string matching
    Leslie, C
    Kuang, R
    LEARNING THEORY AND KERNEL MACHINES, 2003, 2777 : 114 - 128
  • [39] Shape categorization using string kernels
    Daliri, Mohammad Reza
    Delponte, Elisabetta
    Verri, Alessandro
    Torre, Vincent
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2006, 4109 : 297 - 305
  • [40] Language identification based on string kernels
    Kruengkrai, C
    Snichaivattana, P
    Sornlertlamvanich, V
    Isahara, H
    International Symposium on Communications and Information Technologies 2005, Vols 1 and 2, Proceedings, 2005, : 896 - 899