Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

被引:19
|
作者
Iqbal, Muhammad Javed [1 ]
Faye, Ibrahima [2 ]
Samir, Brahim Belhaouari [3 ]
Said, Abas Md [1 ]
机构
[1] Univ Teknol PETRONAS, Dept Comp & Informat Sci, Tronoh 31750, Perak, Malaysia
[2] Univ Teknol PETRONAS, Fundamental & Appl Sci Dept, Tronoh 31750, Perak, Malaysia
[3] Alfaisal Univ, Coll Sci, Riyadh 11533, Saudi Arabia
来源
关键词
FEATURE-EXTRACTION;
D O I
10.1155/2014/173869
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] CLASSIFICATION AND FEATURE SELECTION WITH HUMAN PERFORMANCE DATA
    Pavlopoulou, Christina
    Yu, Stella X.
    2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 1557 - 1560
  • [42] Logic classification and feature selection for biomedical data
    Bertolazzi, P.
    Felici, G.
    Festa, P.
    Lancia, G.
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2008, 55 (05) : 889 - 899
  • [43] Bagging and Feature Selection for Classification with Incomplete Data
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    Xue, Bing
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2017, PT I, 2017, 10199 : 471 - 486
  • [44] A Novel Unsupervised Feature Selection Method for Bioinformatics Data Sets through Feature Clustering
    Li, Guangrong
    Hu, Xiaohua
    Shen, Xiajiong
    Chen, Xin
    Li, Zhoujun
    2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, : 41 - +
  • [45] Efficient and robust feature extraction and selection for traffic classification
    Shi, Hongtao
    Li, Hongping
    Zhang, Dan
    Cheng, Chaqiu
    Wu, Wei
    COMPUTER NETWORKS, 2017, 119 : 1 - 16
  • [46] An Efficient Selection of HOG Feature for SVM Classification of Vehicle
    Lee, Seung-Hyun
    Bang, MinSuk
    Jung, Kyeong-Hoon
    Yi, Kang
    2015 IEEE INTERNATIONAL SYMPOSIUM ON CONSUMER ELECTRONICS (ISCE), 2015,
  • [47] An efficient feature selection algorithm for hybrid data
    Wang, Feng
    Liang, Jiye
    NEUROCOMPUTING, 2016, 193 : 33 - 41
  • [48] An efficient technique for protein sequence clustering and classification
    Vijay, PA
    Murty, MN
    Subramanian, DK
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, 2004, : 447 - 450
  • [49] A computational efficient algorithm for protein sequence classification
    Li, YM
    Lu, HM
    NANOTECH 2003, VOL 1, 2003, : 24 - 27
  • [50] novel feature selection based on apriori property and correlation analysis for protein sequence classification using MapReduce
    Bhavani, R.
    Sadasivam, G. Sudha
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2017, 17 (03) : 255 - 265