An Efficient Computational Intelligence Technique for Classification of Protein Sequences

被引:0
|
作者
Iqbal, Muhammad Javed [1 ]
Faye, Ibrahima [2 ]
Said, Abas Md [1 ]
Samir, Brahim Belhaouari [3 ]
机构
[1] Univ Teknol PETRONAS, Dept Comp & Informat Sci, Tronoh, Malaysia
[2] Univ Teknol PETRONAS, Dept Fundamental & Appl Sci, Tronoh, Malaysia
[3] Alfaisal Univ, Coll Sci, Riyadh, Saudi Arabia
关键词
Bioinformatics; Feature encoding; Data mining; Superfamily; Protein classification;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many artificial intelligence techniques have been developed to process the constantly increasing volume of data to extract meaningful information from it. The accurate annotation of the unknown protein using the classification of the protein sequence into an existing superfamily is considered a critical and challenging task in bioinformatics and computational biology. This classification would be helpful in the analysis and modeling of unknown protein to determine their structure and function. In this paper, a frequency-based feature encoding technique has been used in the proposed framework to represent amino acids of a protein's primary sequence. The technique has considered the occurrence frequency of each amino acid in a sequence. Popular classification algorithms such as decision tree, naive Bayes, neural network, random forest and support vector machine have been employed to evaluate the effectiveness of the encoding method utilized in the proposed framework. Results have indicated that the decision tree classifier significantly shows better results in terms of classification accuracy, specificity, sensitivity, F-measure, etc. The classification accuracy of 88.7% was achieved over the Yeast protein sequence data taken from the well-known UniProtKB database.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] COMPUTATIONAL TECHNIQUE FOR AN EFFICIENT CLASSIFICATION OF PROTEIN SEQUENCES WITH DISTANCE-BASED SEQUENCE ENCODING ALGORITHM
    Iqbal, Muhammad Javed
    Faye, Ibrahima
    Said, Abas Md
    Samir, Brahim Belhaouari
    COMPUTATIONAL INTELLIGENCE, 2017, 33 (01) : 32 - 55
  • [2] A computational efficient algorithm for protein sequence classification
    Li, YM
    Lu, HM
    NANOTECH 2003, VOL 1, 2003, : 24 - 27
  • [3] Efficient Classification of Pollen Grains Using Computational Intelligence Approach
    Dhawale, V. R.
    Tidke, J. A.
    Dudul, S. V.
    2014 INTERNATIONAL CONFERENCE FOR CONVERGENCE OF TECHNOLOGY (I2CT), 2014,
  • [4] An efficient technique for protein sequence clustering and classification
    Vijay, PA
    Murty, MN
    Subramanian, DK
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, 2004, : 447 - 450
  • [5] Efficient median based clustering and classification techniques for protein sequences
    Vijaya, P. A.
    Murty, M. Narasimha
    Subramanian, D. K.
    PATTERN ANALYSIS AND APPLICATIONS, 2006, 9 (2-3) : 243 - 255
  • [6] Efficient median based clustering and classification techniques for protein sequences
    P. A. Vijaya
    M. Narasimha Murty
    D. K. Subramanian
    Pattern Analysis and Applications, 2006, 9 : 243 - 255
  • [7] Automated learning of genome sequences by computational intelligence
    Yang, Mary Qu
    Yang, Jack Y.
    Lu, Zuojie
    Ersoy, Okan K.
    2005 ICSC CONGRESS ON COMPUTATIONAL INTELLIGENCE METHODS AND APPLICATIONS (CIMA 2005), 2005, : 127 - 132
  • [8] A computational intelligence system for cell classification
    Lin, W
    Xiao, JH
    Micheli-Tzanakou, E
    ITAB 98: 1998 IEEE INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY APPLICATIONS IN BIOMEDICINE, 1998, : 105 - 109
  • [9] Knowledge-based computational intelligence development for predicting protein secondary structures from sequences
    Shen, Hong-Bin
    Yi, Dong-Liang
    Yao, Li-Xiu
    Yang, Jie
    Chou, Kuo-Chen
    EXPERT REVIEW OF PROTEOMICS, 2008, 5 (05) : 653 - 662
  • [10] Sequence-based protein superfamily classification using computational intelligence techniques: a review
    Vipsita, Swati
    Rath, Santanu Kumar
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2015, 11 (04) : 424 - 457