LBCE-XGB: A XGBoost Model for Predicting Linear B-Cell Epitopes Based on BERT Embeddings

被引:4
|
作者
Liu, Yufeng [1 ]
Liu, Yinbo [1 ]
Wang, Shuyu [1 ]
Zhu, Xiaolei [1 ]
机构
[1] Anhui Agr Univ, Sch Sci, Hefei 230036, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Linear B cell epitope; BERT; XGBoost; Natural language processing; Machine learning; SITES;
D O I
10.1007/s12539-023-00549-z
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Accurately detecting linear B-cell epitopes (BCEs) makes great sense in vaccine design, immunodiagnostic test, antibody production, disease prevention and treatment. Wet-lab experiments for determining linear BCEs are both expensive and laborious, which are not able to meet the recognition needs of modern massive protein sequence data. Instead, computational methods can efficiently identify linear BCEs with low cost. Although several computational methods are available, the performance is still not satisfactory. Thus, we propose a new method, LBCE-XGB, to forecast linear BCEs based on XGBoost algorithm. To represent the biological information concealed in peptide sequences, the embeddings of the residues were obtained from a pre-trained domain-specific BERT model. In addition, the other five types of attributes comprising amino acid composition, amino acid antigenicity scale were also extracted. The best feature combination was determined according to the cross-validation results. Against the models developed by other deep learning and machine learning algorithms, LBCE-XGB achieves the top performance with an AUROC of 0.845 for fivefold cross-validation. The results on the independent test set show that our model attains an AUROC of 0.838 which is substantially higher than other state-of-the-art methods. The outcomes indicate that the representations of BERT could be an effective feature in predicting linear BCEs and we believe that LBCE-XGB could be a useful medium for detecting linear B cell epitopes with high accuracy and low cost.
引用
收藏
页码:293 / 305
页数:13
相关论文
共 50 条
  • [1] LBCE-XGB: A XGBoost Model for Predicting Linear B-Cell Epitopes Based on BERT Embeddings
    Yufeng Liu
    Yinbo Liu
    Shuyu Wang
    Xiaolei Zhu
    Interdisciplinary Sciences: Computational Life Sciences, 2023, 15 : 293 - 305
  • [2] Prediction of linear B-cell epitopes based on protein sequence features and BERT embeddings
    Fang Liu
    ChengCheng Yuan
    Haoqiang Chen
    Fei Yang
    Scientific Reports, 14
  • [3] Prediction of linear B-cell epitopes based on protein sequence features and BERT embeddings
    Liu, Fang
    Yuan, Chengcheng
    Chen, Haoqiang
    Yang, Fei
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [4] Predicting linear B-cell epitopes using string kernels
    El-Manzalawy, Yasser
    Dobbs, Drena
    Honavar, Vasant
    JOURNAL OF MOLECULAR RECOGNITION, 2008, 21 (04) : 243 - 255
  • [5] Predicting Protective Linear B-cell Epitopes using Evolutionary Information
    EL-Manzalawy, Yasser
    Dobbs, Drena
    Honavar, Vasant
    2008 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS, 2008, : 289 - +
  • [6] Prediction of linear B-cell epitopes
    Davydov, Ya. I.
    Tonevitsky, A. G.
    MOLECULAR BIOLOGY, 2009, 43 (01) : 150 - 158
  • [7] Prediction of linear B-cell epitopes
    Ya. I. Davydov
    A. G. Tonevitsky
    Molecular Biology, 2009, 43 : 150 - 158
  • [8] Linear B-cell epitopes prediction using bagging based proposed ensemble model
    Gupta V.K.
    Gupta A.
    Jain P.
    Kumar P.
    International Journal of Information Technology, 2022, 14 (7) : 3517 - 3526
  • [9] LBCEPred: a machine learning model to predict linear B-cell epitopes
    Alghamdi, Wajdi
    Attique, Muhammad
    Alzahrani, Ebraheem
    Ullah, Malik Zaka
    Khan, Yaser Daanial
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (03)
  • [10] Predicting linear B-cell epitopes using amino acid anchoring pair composition
    Weike Shen
    Yuan Cao
    Lei Cha
    Xufei Zhang
    Xiaomin Ying
    Wei Zhang
    Kun Ge
    Wuju Li
    Li Zhong
    BioData Mining, 8