Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

被引:19
|
作者
Iqbal, Muhammad Javed [1 ]
Faye, Ibrahima [2 ]
Samir, Brahim Belhaouari [3 ]
Said, Abas Md [1 ]
机构
[1] Univ Teknol PETRONAS, Dept Comp & Informat Sci, Tronoh 31750, Perak, Malaysia
[2] Univ Teknol PETRONAS, Fundamental & Appl Sci Dept, Tronoh 31750, Perak, Malaysia
[3] Alfaisal Univ, Coll Sci, Riyadh 11533, Saudi Arabia
来源
关键词
FEATURE-EXTRACTION;
D O I
10.1155/2014/173869
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Protein sequence classification using feature hashing
    Caragea, Cornelia
    Silvescu, Adrian
    Mitra, Prasenjit
    PROTEOME SCIENCE, 2012, 10
  • [32] Protein Sequence Classification Using Feature Hashing
    Caragea, Cornelia
    Silvescu, Adrian
    Mitra, Prasenjit
    2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011), 2011, : 538 - 543
  • [33] Protein sequence classification using feature hashing
    Cornelia Caragea
    Adrian Silvescu
    Prasenjit Mitra
    Proteome Science, 10
  • [34] Feature Selection for Improved Classification of Protein Structures
    Mirceva, G.
    Ivanoska, I.
    Naumoski, A.
    Kulakov, A.
    2019 42ND INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2019, : 1013 - 1018
  • [35] Feature Selection in Clinical Data Processing For Classification
    Seethal, C. R.
    Panicker, Janu R.
    Vasudevan, Veena
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE (ICIS), 2016, : 172 - 175
  • [36] Automatic feature selection for classification of health data
    He, HX
    Jin, HD
    Chen, J
    AI 2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2005, 3809 : 910 - 913
  • [37] Online feature selection and classification with incomplete data
    Kalkan, Habil
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2014, 22 (06) : 1625 - 1636
  • [38] Feature Selection for Classification of Hyperspectral Data by SVM
    Pal, Mahesh
    Foody, Giles M.
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2010, 48 (05): : 2297 - 2307
  • [39] Feature Selection for EEG Data Classification with Weka
    Murtazina, Marina
    Avdeenko, Tatiana
    ADVANCES IN SWARM INTELLIGENCE, ICSI 2022, PT II, 2022, : 279 - 288
  • [40] A Projected Feature Selection Algorithm for Data Classification
    Yin, Zhiwu
    Huang, Shangteng
    2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15, 2007, : 3665 - 3668