Prediction of Amyloid Proteins Using Embedded Evolutionary & Ensemble Feature Selection Based Descriptors With eXtreme Gradient Boosting Model

被引:19
|
作者
Akbar, Shahid [1 ]
Ali, Hashim [1 ]
Ahmad, Ashfaq [2 ]
Sarker, Mahidur R. R. [3 ]
Saeed, Aamir [4 ]
Salwana, Ely [3 ]
Gul, Sarah [5 ]
Khan, Ahmad [1 ]
Ali, Farman [6 ]
机构
[1] Abdul Wali Khan Univ Mardan, Dept Comp Sci, Mardan 23200, Khyber Pakhtunk, Pakistan
[2] MY Univ, Dept Comp Sci, Islamabad 44000, Pakistan
[3] Univ Kebangsaan Malaysia, Inst IR4 0, Bangi 43600, Malaysia
[4] Univ Engn & Technol Peshawar, Dept Comp Sci & IT, Peshawar 25000, Pakistan
[5] Int Islamic Univ Islamabad, Dept Biol Sci, FBAS, Islamabad 44000, Pakistan
[6] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
关键词
Amyloid proteins; K-separated bigrams; eXtreme gradient boosting; filter-position specific scoring matrix; ensemble feature selection; classification; OVERSAMPLING TECHNIQUE; DIPEPTIDE COMPOSITION; IDENTIFICATION; SERVER; SMOTE;
D O I
10.1109/ACCESS.2023.3268523
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Amyloid proteins (AMYs) are usually an aggregate of insoluble fibrous that have major pathogenic effects on various tissues. However, its abnormal deposition may lead to several diseases i.e., Parkinson's, Alzheimer's, and type 2 diabetes. In addition, AMYs form amyloid aggregates when they are in a misfolded state. Therefore, it is crucial to accurately predict AMYs and their pathogenic characteristics. Various computational predictors have been presented for the accurate prediction of AMYs. Although, the effectiveness of these predictors is unsatisfactory due to their low generalization abilities and high training cost. In this attempt, we proposed an intelligent computational predictor for the accurate prediction of AMYs. The novel embedded evolutionary features are gathered using K-separated bigrams, and the Filter method into the evolutionary descriptors. Moreover, DDE-based enhanced frequency coupling information are gathered from the Amyloid sequences. Additionally, a multi-model vector is obtained by combining the features of the applied formulation techniques. To reduce the computational cost of the proposed model, the eXtreme Gradient Boosting-Recursive Feature Elimination (XGB-RFE) based high-ranked features are selected from the heterogeneous vector. In the next part, the optimal features are evaluated via several learners, i.e., XGBoost (XGB), Light Gradient Boosted Machine (LGBM), Support Vector Machine (SVM), Adaboost (ada), and Extra Trees classifier (ETC),. The proposed model reported an improved predictive prediction accuracy of 93.10% using training sequences and 89.67% using independent sequences, respectively. Which is similar to 4% higher training accuracy than existing predictors. It is anticipated that our predictive approach will be useful for scientists and might play a key role in drug development and academic research.
引用
收藏
页码:39024 / 39036
页数:13
相关论文
共 50 条
  • [41] PreDTIs: prediction of drug-target interactions based on multiple feature information using gradient boosting framework with data balancing and feature selection techniques
    Mahmud, S. M. Hasan
    Chen, Wenyu
    Liu, Yongsheng
    Awal, Md Abdul
    Ahmed, Kawsar
    Rahman, Md Habibur
    Moni, Mohammad Ali
    BRIEFINGS IN BIOINFORMATICS, 2021, 22 (05)
  • [42] XGB-DrugPred: computational prediction of druggable proteins using eXtreme gradient boosting and optimized features set
    Rahu Sikander
    Ali Ghulam
    Farman Ali
    Scientific Reports, 12
  • [43] XGB-DrugPred: computational prediction of druggable proteins using eXtreme gradient boosting and optimized features set
    Sikander, Rahu
    Ghulam, Ali
    Ali, Farman
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [44] Modeling Pedestrian Injury Severity: A Case Study of Using Extreme Gradient Boosting Vs Random Forest in Feature Selection
    Wu, Zhenxi
    Misra, Aditi
    Bao, Shan
    TRANSPORTATION RESEARCH RECORD, 2024, 2678 (01) : 1 - 11
  • [45] Feature Selection Using Extreme Gradient Boosting Bayesian Optimization to upgrade the Classification Performance of Motor Imagery signals for BCI
    Thenmozhi, T.
    Helen, R.
    JOURNAL OF NEUROSCIENCE METHODS, 2022, 366
  • [46] Expanded feature space-based gradient boosting ensemble learning for risk prediction of type 2 diabetes complications
    Wang, Yuyan
    Wang, Sutong
    Sima, Xiutian
    Song, Yu
    Cui, Shaoze
    Wang, Dujuan
    APPLIED SOFT COMPUTING, 2023, 144
  • [47] Feature-selection-based dynamic transfer ensemble model for customer churn prediction
    Jin Xiao
    Yi Xiao
    Anqiang Huang
    Dunhu Liu
    Shouyang Wang
    Knowledge and Information Systems, 2015, 43 : 29 - 51
  • [48] Feature-selection-based dynamic transfer ensemble model for customer churn prediction
    Xiao, Jin
    Xiao, Yi
    Huang, Anqiang
    Liu, Dunhu
    Wang, Shouyang
    KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 43 (01) : 29 - 51
  • [49] Traffic Incident Clearance Time Prediction and Influencing Factor Analysis Using Extreme Gradient Boosting Model
    Tang, Jinjun
    Zheng, Lanlan
    Han, Chunyang
    Liu, Fang
    Cai, Jianming
    JOURNAL OF ADVANCED TRANSPORTATION, 2020, 2020
  • [50] Traffic Incident Clearance Time Prediction and Influencing Factor Analysis Using Extreme Gradient Boosting Model
    Tang, Jinjun
    Zheng, Lanlan
    Han, Chunyang
    Liu, Fang
    Cai, Jianming
    Journal of Advanced Transportation, 2020, 2020