Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

被引:19
|
作者
Iqbal, Muhammad Javed [1 ]
Faye, Ibrahima [2 ]
Samir, Brahim Belhaouari [3 ]
Said, Abas Md [1 ]
机构
[1] Univ Teknol PETRONAS, Dept Comp & Informat Sci, Tronoh 31750, Perak, Malaysia
[2] Univ Teknol PETRONAS, Fundamental & Appl Sci Dept, Tronoh 31750, Perak, Malaysia
[3] Alfaisal Univ, Coll Sci, Riyadh 11533, Saudi Arabia
来源
关键词
FEATURE-EXTRACTION;
D O I
10.1155/2014/173869
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Efficient feature selection and classification for microarray data
    Li, Zifa
    Xie, Weibo
    Liu, Tao
    PLOS ONE, 2018, 13 (08):
  • [2] Penalized feature selection and classification in bioinformatics
    Ma, Shuangge
    Huang, Jian
    BRIEFINGS IN BIOINFORMATICS, 2008, 9 (05) : 392 - 403
  • [3] Novel and efficient method on feature selection and data classification
    Chen, Tieming
    Ma, Jixia
    Huang, Samuel H.
    Cai, Jiamei
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2012, 49 (04): : 735 - 745
  • [4] Multi-Label Bioinformatics Data Classification With Ensemble Embedded Feature Selection
    Guo, Yumeng
    Chung, Fu-Lai
    Li, Guozheng
    Zhang, Lei
    IEEE ACCESS, 2019, 7 : 103863 - 103875
  • [5] A Distance-Based Feature-Encoding Technique for Protein Sequence Classification in Bioinformatics
    Iqbal, Muhammad Jayed
    Faye, Ibrahima
    Said, Abas Md
    Samir, Brahim Belhaouari
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND CYBERNETICS (CYBERNETICSCOM), 2013, : 1 - 5
  • [6] Feature selection for genetic sequence classification
    Chuzhanova, NA
    Jones, AJ
    Margetts, S
    BIOINFORMATICS, 1998, 14 (02) : 139 - 143
  • [7] A Novel Technique of Feature Selection with ReliefF and CFS for Protein Sequence Classification
    Kaur, Kiranpreet
    Patil, Nagamma
    RECENT FINDINGS IN INTELLIGENT COMPUTING TECHNIQUES, VOL 1, 2019, 707 : 399 - 405
  • [8] A fast and novel approach based on grouping and weighted mRMR for feature selection and classification of protein sequence data
    Kaur, Kiranpreet
    Patil, Nagamma
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2020, 23 (01) : 47 - 61
  • [9] Feature Selection Techniques for Bioinformatics Data Analysis
    Theng, Dipti
    Bhoyar, K. K.
    2022 INTERNATIONAL CONFERENCE ON GREEN ENERGY, COMPUTING AND SUSTAINABLE TECHNOLOGY (GECOST), 2022, : 46 - 50
  • [10] An efficient statistical feature selection approach for classification of gene expression data
    Chandra, B.
    Gupta, Manish
    JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (04) : 529 - 535