Prediction of Amyloid Proteins Using Embedded Evolutionary & Ensemble Feature Selection Based Descriptors With eXtreme Gradient Boosting Model

被引:19
|
作者
Akbar, Shahid [1 ]
Ali, Hashim [1 ]
Ahmad, Ashfaq [2 ]
Sarker, Mahidur R. R. [3 ]
Saeed, Aamir [4 ]
Salwana, Ely [3 ]
Gul, Sarah [5 ]
Khan, Ahmad [1 ]
Ali, Farman [6 ]
机构
[1] Abdul Wali Khan Univ Mardan, Dept Comp Sci, Mardan 23200, Khyber Pakhtunk, Pakistan
[2] MY Univ, Dept Comp Sci, Islamabad 44000, Pakistan
[3] Univ Kebangsaan Malaysia, Inst IR4 0, Bangi 43600, Malaysia
[4] Univ Engn & Technol Peshawar, Dept Comp Sci & IT, Peshawar 25000, Pakistan
[5] Int Islamic Univ Islamabad, Dept Biol Sci, FBAS, Islamabad 44000, Pakistan
[6] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
关键词
Amyloid proteins; K-separated bigrams; eXtreme gradient boosting; filter-position specific scoring matrix; ensemble feature selection; classification; OVERSAMPLING TECHNIQUE; DIPEPTIDE COMPOSITION; IDENTIFICATION; SERVER; SMOTE;
D O I
10.1109/ACCESS.2023.3268523
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Amyloid proteins (AMYs) are usually an aggregate of insoluble fibrous that have major pathogenic effects on various tissues. However, its abnormal deposition may lead to several diseases i.e., Parkinson's, Alzheimer's, and type 2 diabetes. In addition, AMYs form amyloid aggregates when they are in a misfolded state. Therefore, it is crucial to accurately predict AMYs and their pathogenic characteristics. Various computational predictors have been presented for the accurate prediction of AMYs. Although, the effectiveness of these predictors is unsatisfactory due to their low generalization abilities and high training cost. In this attempt, we proposed an intelligent computational predictor for the accurate prediction of AMYs. The novel embedded evolutionary features are gathered using K-separated bigrams, and the Filter method into the evolutionary descriptors. Moreover, DDE-based enhanced frequency coupling information are gathered from the Amyloid sequences. Additionally, a multi-model vector is obtained by combining the features of the applied formulation techniques. To reduce the computational cost of the proposed model, the eXtreme Gradient Boosting-Recursive Feature Elimination (XGB-RFE) based high-ranked features are selected from the heterogeneous vector. In the next part, the optimal features are evaluated via several learners, i.e., XGBoost (XGB), Light Gradient Boosted Machine (LGBM), Support Vector Machine (SVM), Adaboost (ada), and Extra Trees classifier (ETC),. The proposed model reported an improved predictive prediction accuracy of 93.10% using training sequences and 89.67% using independent sequences, respectively. Which is similar to 4% higher training accuracy than existing predictors. It is anticipated that our predictive approach will be useful for scientists and might play a key role in drug development and academic research.
引用
收藏
页码:39024 / 39036
页数:13
相关论文
共 50 条
  • [1] IOT AND CLOUD BASED AUTOMATED POTHOLE DETECTION MODEL USING EXTREME GRADIENT BOOSTING WITH TEXTURE DESCRIPTORS
    Ghafoor, Kayhan Zrar
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2023, 24 (04): : 713 - 728
  • [2] Predicting quorum sensing peptides using stacked generalization ensemble with gradient boosting based feature selection
    Sivaramakrishnan, Muthusaravanan
    Suresh, Rahul
    Ponraj, Kannapiran
    JOURNAL OF MICROBIOLOGY, 2022, 60 (07) : 756 - 765
  • [3] Predicting quorum sensing peptides using stacked generalization ensemble with gradient boosting based feature selection
    Muthusaravanan Sivaramakrishnan
    Rahul Suresh
    Kannapiran Ponraj
    Journal of Microbiology, 2022, 60 : 756 - 765
  • [4] An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost
    Selçuk Demir
    Emrehan Kutlug Sahin
    Neural Computing and Applications, 2023, 35 : 3173 - 3190
  • [5] An investigation of feature selection methods for soil liquefaction prediction based on tree-based ensemble algorithms using AdaBoost, gradient boosting, and XGBoost
    Demir, Selcuk
    Sahin, Emrehan Kutlug
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (04): : 3173 - 3190
  • [6] A boosting ensemble learning based hybrid light gradient boosting machine and extreme gradient boosting model for predicting house prices
    Sibindi, Racheal
    Mwangi, Ronald Waweru
    Waititu, Anthony Gichuhi
    ENGINEERING REPORTS, 2023, 5 (04)
  • [7] IoT Botnet Attack Detection Based on Optimized Extreme Gradient Boosting and Feature Selection
    Alqahtani, Mnahi
    Mathkour, Hassan
    Ben Ismail, Mohamed Maher
    SENSORS, 2020, 20 (21) : 1 - 21
  • [8] Breast Cancer Classification Using an Extreme Gradient Boosting Model with F-Score Feature Selection Technique
    Mathew, Tina Elizabeth
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2023, 14 (02) : 363 - 372
  • [9] Feature selection using ModifiedBoostARoota and prediction of heart diseases using Gradient Boosting algorithms
    Anuradha, P.
    David, Vasantha Kalyani
    2021 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, AND INTELLIGENT SYSTEMS (ICCCIS), 2021, : 19 - 23
  • [10] Feature Selection in Click-Through Rate Prediction Based on Gradient Boosting
    Wang, Zheng
    Yu, Qingsong
    Shen, Chaomin
    Hu, Wenxin
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 134 - 142