Protein Sequence Classification Using Feature Hashing

被引:16
|
作者
Caragea, Cornelia [1 ]
Silvescu, Adrian [2 ]
Mitra, Prasenjit [1 ]
机构
[1] Penn State Univ, Informat Sci & Technol, University Pk, PA 16802 USA
[2] Naviance Inc, Oakland, CA USA
基金
美国国家科学基金会;
关键词
feature hashing; variable length k-grams; dimensionality reduction; PREDICTION;
D O I
10.1109/BIBM.2011.91
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recent advances in next-generation sequencing technologies have resulted in an exponential increase in protein sequence data. The k-gram representation, used for protein sequence classification, usually results in prohibitively high dimensional input spaces, for large values of k. Applying data mining algorithms to these input spaces may be intractable due to the large number of dimensions. Hence, using dimensionality reduction techniques can be crucial for the performance and the complexity of the learning algorithms. We study the applicability of feature hashing to protein sequence classification, where the original high-dimensional space is "reduced" by mapping features to hash keys, such that multiple features can be mapped (at random) to the same key, and "aggregating" their counts. We compare feature hashing with the "bag of kgrams" and feature selection approaches. Our results show that feature hashing is an effective approach to reducing dimensionality on protein sequence classification tasks.
引用
收藏
页码:538 / 543
页数:6
相关论文
共 50 条
  • [21] Feature Selection of Protein Structural Classification Using SVM Classifier
    Krajewski, Zbigniew
    Tkacz, Ewaryst
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2013, 33 (01) : 47 - 61
  • [22] Feature Selection and Classification of Protein Subfamilies Using Rough Sets
    Rahman, Shuzlina Abdul
    Abu Bakar, Azuraliza
    Hussein, Zeti Azura Mohamed
    2009 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS, VOLS 1 AND 2, 2009, : 32 - 35
  • [23] A Distance-Based Feature-Encoding Technique for Protein Sequence Classification in Bioinformatics
    Iqbal, Muhammad Jayed
    Faye, Ibrahima
    Said, Abas Md
    Samir, Brahim Belhaouari
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND CYBERNETICS (CYBERNETICSCOM), 2013, : 1 - 5
  • [24] Robust perceptual image hashing using feature points
    Monga, V
    Evans, RL
    ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5, 2004, : 677 - 680
  • [25] Feature ranking for protein classification
    Mhamdi, F
    Rakotomalala, R
    Elloumi, M
    COMPUTER RECOGNITION SYSTEMS, PROCEEDINGS, 2005, : 611 - 617
  • [26] Feature Pyramid Hashing
    Yang, Yifan
    Geng, Libing
    Lai, Hanjiang
    Pan, Yan
    Yin, Jian
    ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 114 - 122
  • [27] A SEMANTIC FEATURE EXTRACTION METHOD FOR HYPERSPECTRAL IMAGE CLASSIFICATION BASED ON HASHING LEARNING
    Zhao, Meng
    Yu, Chunyan
    Song, Meiping
    Chang, Chein-I
    2018 9TH WORKSHOP ON HYPERSPECTRAL IMAGE AND SIGNAL PROCESSING: EVOLUTION IN REMOTE SENSING (WHISPERS), 2018,
  • [28] A Feature Sequence Kernel for Video Concept Classification
    Bailer, Werner
    ADVANCES IN MULTIMEDIA MODELING, PT I, 2011, 6523 : 359 - 369
  • [29] Multiclass unbalanced protein data classification using sequence features
    Vani, Suvarna K.
    Sravani, T. D.
    2014 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2014,
  • [30] Protein sequence classification using probabilistic motifs and neural networks
    Blekas, K
    Fotiadis, DI
    Likas, A
    ARTIFICAIL NEURAL NETWORKS AND NEURAL INFORMATION PROCESSING - ICAN/ICONIP 2003, 2003, 2714 : 702 - 709