Prediction of Amyloid Proteins Using Embedded Evolutionary & Ensemble Feature Selection Based Descriptors With eXtreme Gradient Boosting Model

被引:19
|
作者
Akbar, Shahid [1 ]
Ali, Hashim [1 ]
Ahmad, Ashfaq [2 ]
Sarker, Mahidur R. R. [3 ]
Saeed, Aamir [4 ]
Salwana, Ely [3 ]
Gul, Sarah [5 ]
Khan, Ahmad [1 ]
Ali, Farman [6 ]
机构
[1] Abdul Wali Khan Univ Mardan, Dept Comp Sci, Mardan 23200, Khyber Pakhtunk, Pakistan
[2] MY Univ, Dept Comp Sci, Islamabad 44000, Pakistan
[3] Univ Kebangsaan Malaysia, Inst IR4 0, Bangi 43600, Malaysia
[4] Univ Engn & Technol Peshawar, Dept Comp Sci & IT, Peshawar 25000, Pakistan
[5] Int Islamic Univ Islamabad, Dept Biol Sci, FBAS, Islamabad 44000, Pakistan
[6] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
关键词
Amyloid proteins; K-separated bigrams; eXtreme gradient boosting; filter-position specific scoring matrix; ensemble feature selection; classification; OVERSAMPLING TECHNIQUE; DIPEPTIDE COMPOSITION; IDENTIFICATION; SERVER; SMOTE;
D O I
10.1109/ACCESS.2023.3268523
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Amyloid proteins (AMYs) are usually an aggregate of insoluble fibrous that have major pathogenic effects on various tissues. However, its abnormal deposition may lead to several diseases i.e., Parkinson's, Alzheimer's, and type 2 diabetes. In addition, AMYs form amyloid aggregates when they are in a misfolded state. Therefore, it is crucial to accurately predict AMYs and their pathogenic characteristics. Various computational predictors have been presented for the accurate prediction of AMYs. Although, the effectiveness of these predictors is unsatisfactory due to their low generalization abilities and high training cost. In this attempt, we proposed an intelligent computational predictor for the accurate prediction of AMYs. The novel embedded evolutionary features are gathered using K-separated bigrams, and the Filter method into the evolutionary descriptors. Moreover, DDE-based enhanced frequency coupling information are gathered from the Amyloid sequences. Additionally, a multi-model vector is obtained by combining the features of the applied formulation techniques. To reduce the computational cost of the proposed model, the eXtreme Gradient Boosting-Recursive Feature Elimination (XGB-RFE) based high-ranked features are selected from the heterogeneous vector. In the next part, the optimal features are evaluated via several learners, i.e., XGBoost (XGB), Light Gradient Boosted Machine (LGBM), Support Vector Machine (SVM), Adaboost (ada), and Extra Trees classifier (ETC),. The proposed model reported an improved predictive prediction accuracy of 93.10% using training sequences and 89.67% using independent sequences, respectively. Which is similar to 4% higher training accuracy than existing predictors. It is anticipated that our predictive approach will be useful for scientists and might play a key role in drug development and academic research.
引用
收藏
页码:39024 / 39036
页数:13
相关论文
共 50 条
  • [21] Machine learning-based model for accurate identification of druggable proteins using light extreme gradient boosting
    Alghushairy, Omar
    Ali, Farman
    Alghamdi, Wajdi
    Khalid, Majdi
    Alsini, Raed
    Asiry, Othman
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2024, 42 (22): : 12330 - 12341
  • [22] Integration of extreme gradient boosting feature selection approach with machine learning models: application of weather relative humidity prediction
    Tao, Hai
    Awadh, Salih Muhammad
    Salih, Sinan Q.
    Shafik, Shafik S.
    Yaseen, Zaher Mundher
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (01): : 515 - 533
  • [23] Integration of extreme gradient boosting feature selection approach with machine learning models: application of weather relative humidity prediction
    Hai Tao
    Salih Muhammad Awadh
    Sinan Q. Salih
    Shafik S. Shafik
    Zaher Mundher Yaseen
    Neural Computing and Applications, 2022, 34 : 515 - 533
  • [24] Ensemble learning prediction model for lithium-ion battery remaining useful life based on embedded feature selection
    Wang, Xiao-Tian
    Zhang, Song-Bo
    Wang, Jie-Sheng
    Liu, Xun
    Sun, Yun-Cheng
    Shang-Guan, Yi-Peng
    Zhang, Ze-Zheng
    APPLIED SOFT COMPUTING, 2025, 169
  • [25] Grid-based Urban Fire Prediction Using Extreme Gradient Boosting (XGBoost)
    Oh, Haeng Yeol
    Jeong, Meong-Hun
    SENSORS AND MATERIALS, 2022, 34 (12) : 4879 - 4890
  • [26] Multiclassification Prediction of Clay Sensitivity Using Extreme Gradient Boosting Based on Imbalanced Dataset
    Ma, Tao
    Wu, Lizhou
    Zhu, Shuairun
    Zhu, Hongzhou
    APPLIED SCIENCES-BASEL, 2022, 12 (03):
  • [27] XPredRBR: Accurate and Fast Prediction of RNA-Binding Residues in Proteins Using eXtreme Gradient Boosting
    Deng, Lei
    Dong, Zuojin
    Liu, Hui
    BIOINFORMATICS RESEARCH AND APPLICATIONS, ISBRA 2018, 2018, 10847 : 163 - 173
  • [28] A stacking ensemble model based on nonlinear feature selection for photovoltaic power prediction
    Tang, Xin
    Zhang, Haiqing
    Li, Daiwei
    Tang, Dan
    Gong, Cheng
    Yu, Xi
    2024 7TH ASIA CONFERENCE ON ENERGY AND ELECTRICAL ENGINEERING, ACEEE 2024, 2024, : 345 - 349
  • [29] Rough set-based feature selection for credit risk prediction using weight-adjusted boosting ensemble method
    E. Sivasankar
    C. Selvi
    S. Mahalakshmi
    Soft Computing, 2020, 24 : 3975 - 3988
  • [30] Rough set-based feature selection for credit risk prediction using weight-adjusted boosting ensemble method
    Sivasankar, E.
    Selvi, C.
    Mahalakshmi, S.
    SOFT COMPUTING, 2020, 24 (06) : 3975 - 3988