Semi-supervised learning for classification of protein sequence data

被引:0
|
作者
King, Brian R. [1 ]
Guda, Chittibabu [2 ]
机构
[1] SUNY Albany, Dept Comp Sci, Albany, NY 12222 USA
[2] SUNY Albany, Dept Epidemiol & Biostat, Gen NY Sis Ctr Excellence Canc Genom, Albany, NY 12222 USA
关键词
Bioinformatics; protein sequence classification; semi-supervised learning; expectation maximization; EM;
D O I
10.1155/2008/795010
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Protein sequence data continue to become available at an exponential rate. Annotation of functional and structural attributes of these data lags far behind, with only a small fraction of the data understood and labeled by experimental methods. Classification methods that are based on semi-supervised learning can increase the overall accuracy of classifying partly labeled data in many domains, but very few methods exist that have shown their effect on protein sequence classification. We show how proven methods from text classification can be applied to protein sequence data, as we consider both existing and novel extensions to the basic methods, and demonstrate restrictions and differences that must be considered. We demonstrate comparative results against the transductive support vector machine, and show superior results on the most difficult classification problems. Our results show that large repositories of unlabeled protein sequence data can indeed be used to improve predictive performance, particularly in situations where there are fewer labeled protein sequences available, and/or the data are highly unbalanced in nature.
引用
收藏
页码:5 / 29
页数:25
相关论文
共 50 条
  • [1] A Semi-Supervised Learning Algorithm for Data Classification
    Kuo, Cheng-Chien
    Shieh, Horng-Lin
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (05)
  • [2] A Novel Semi-supervised Approach for Protein Sequence Classification
    Chaturvedi, Bharti
    Patil, Nagamma
    [J]. 2015 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2015, : 1158 - 1162
  • [3] Semi-supervised Sequence Learning
    Dai, Andrew M.
    Le, Quoc V.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [4] Active semi-supervised learning for biological data classification
    Camargo, Guilherme
    Bugatti, Pedro H.
    Saito, Priscila T. M.
    [J]. PLOS ONE, 2020, 15 (08):
  • [5] A collective learning approach for semi-supervised data classification
    Uylas Sati, Nur
    [J]. PAMUKKALE UNIVERSITY JOURNAL OF ENGINEERING SCIENCES-PAMUKKALE UNIVERSITESI MUHENDISLIK BILIMLERI DERGISI, 2018, 24 (05): : 864 - 869
  • [6] COMBINED UNSUPERVISED AND SEMI-SUPERVISED LEARNING FOR DATA CLASSIFICATION
    Breve, Fabricio Aparecido
    Guimaraes Pedronette, Daniel Carlos
    [J]. 2016 IEEE 26TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2016,
  • [7] Semi-supervised sequence classification with HMMs
    Zhong, S
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2005, 19 (02) : 165 - 182
  • [8] Semi-Supervised Learning for ECG Classification
    Rodrigues, Rui
    Couto, Paula
    [J]. 2021 COMPUTING IN CARDIOLOGY (CINC), 2021,
  • [9] Augmentation Learning for Semi-Supervised Classification
    Frommknecht, Tim
    Zipf, Pedro Alves
    Fan, Quanfu
    Shvetsova, Nina
    Kuehne, Hilde
    [J]. PATTERN RECOGNITION, DAGM GCPR 2022, 2022, 13485 : 85 - 98
  • [10] Evaluation of Semi-supervised Learning for Classification of Protein Crystallization Imagery
    Sigdel, Madhav
    Dinc, Imren
    Dinc, Semih
    Sigdel, Madhu S.
    Pusey, Marc L.
    Aygun, Ramazan S.
    [J]. IEEE SOUTHEASTCON 2014, 2014,