Prediction of Essential Proteins Using Genetic Algorithm as a Feature Selection Technique

被引:0
|
作者
Inzamam-Ul-Hossain, Md. [1 ]
Islam, Md. Rafiqul [1 ,2 ]
机构
[1] Khulna Univ, Dept Comp Sci & Engn, Khulna 9208, Bangladesh
[2] Amer Int Univ Bangladesh AIUB, Dept Comp Sci, Dhaka 1229, Bangladesh
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Proteins; Genetic algorithms; Accuracy; Feature extraction; Biological cells; Encoding; Random forests; Topology; Biological feature; composite features; essential proteins; genetic algorithm; SMOTE-ENN; SOMTE-Tomek; topological feature; DATABASE; CLASSIFICATION; OPTIMIZATION; EXPRESSION;
D O I
10.1109/ACCESS.2024.3446992
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Essential proteins play a vital role in the preparation of antibiotics, disease diagnosis, and understanding the structure of an organism. It is crucial for cell survival and is associated with various human diseases. Recently, many methods have been proposed for identifying essential proteins. These methods improve the accuracy of identifying essential proteins, but there is still a gap between the highest achievable accuracy and the accuracy achieved by these methods. Also, the other performance metrics, such as recall, specificity, and F1-score, are still very low. Because of the importance of the essential proteins and the lack of performance of past research work, an efficient approach is proposed to predict the essential proteins with high performance. This paper uses a genetic algorithm-based feature selection technique to get the optimal number of features to identify the essential proteins. For data balancing, different techniques are used to get the best-balanced dataset. Both topological and biological features are used in this method. The Saccharomyces cerevisiae (S.cerevisiae) dataset is used to evaluate the proposed method. Another dataset of the species Escherichia coli (E.coli) is used to validate the performance of this method. Any of the three classification techniques, such as Random Forest, LightGBM, and XGBoost, are used individually in the genetic algorithm's fitness function to calculate the accuracy and F1-score average. The proposed method produces the best performance metrics in both datasets with a smaller number of features than the original features. The highest accuracy achieved for the S.cerevisiae dataset is 94.69% and 95.11% for the E.coli dataset. Other performance scores, such as recall and F1-score, are also high compared to the existing methods. The proposed method was compared with other existing methods and showed that it outperformed other existing methods in experimental results.
引用
收藏
页码:126200 / 126220
页数:21
相关论文
共 50 条
  • [1] Prediction of thermophilic proteins using feature selection technique
    Lin, Hao
    Chen, Wei
    [J]. JOURNAL OF MICROBIOLOGICAL METHODS, 2011, 84 (01) : 67 - 70
  • [2] Application of Genetic Algorithm as Feature Selection Technique in Development of Effective Fault Prediction Model
    Kumar, Lov K
    Rath, Santanu Ku.
    [J]. 2016 IEEE UTTAR PRADESH SECTION INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND ELECTRONICS ENGINEERING (UPCON), 2016, : 432 - 437
  • [3] Feature selection with genetic algorithm for protein function prediction
    Santos, Bruno C.
    Rodrigues, Marcos W.
    Pinto, Cristiano L. N.
    Nobre, Cristiane N.
    Zarate, Luis E.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2019, : 2434 - 2439
  • [4] Improving Heart Disease Prediction Using Feature Selection Through Genetic Algorithm
    Aleem, Abdul
    Prateek, Gautam
    Kumar, Naveen
    [J]. ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2021, 2022, 1534 : 765 - 776
  • [5] Feature subset selection using a genetic algorithm
    Yang, JH
    Honavar, V
    [J]. IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1998, 13 (02): : 44 - 49
  • [6] Face feature selection using genetic algorithm
    Yin Hongtao
    Fu Ping
    Sha Xuejun
    [J]. ISTM/2009: 8TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-6, 2009, : 980 - 983
  • [7] Feature Selection Using Diploid Genetic Algorithm
    Jasuja A.
    [J]. Annals of Data Science, 2020, 7 (01) : 33 - 43
  • [8] Genetic Algorithm Based Feature Selection Technique for Electroencephalography Data
    Ali, Tariq
    Nawaz, Asif
    Sadia, Hafiza Ayesha
    [J]. APPLIED COMPUTER SYSTEMS, 2019, 24 (02) : 119 - 127
  • [9] Feature construction and selection using Genetic Programming and a Genetic Algorithm
    Smith, MG
    Bull, L
    [J]. GENETIC PROGRAMMING, PROCEEDINGS, 2003, 2610 : 229 - 237
  • [10] Feature Selection Using Genetic Algorithm for Big Data
    Saidi, Rania
    Ncir, Waad Bouaguel
    Essoussi, Nadia
    [J]. INTERNATIONAL CONFERENCE ON ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS (AMLTA2018), 2018, 723 : 352 - 361