Pre-trained protein language model sheds new light on the prediction of Arabidopsis protein-protein interactions

被引:4
|
作者
Zhou, Kewei [1 ]
Lei, Chenping [1 ]
Zheng, Jingyan [1 ]
Huang, Yan [1 ]
Zhang, Ziding [1 ]
机构
[1] China Agr Univ, Coll Biol Sci, State Key Lab Anim Biotech Breeding, Beijing 100193, Peoples R China
基金
中国国家自然科学基金;
关键词
Arabidopsis; Protein-protein interactions; Machine learning; Pre-trained language model; Natural language processing; ANNOTATION; DATABASES;
D O I
10.1186/s13007-023-01119-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundProtein-protein interactions (PPIs) are heavily involved in many biological processes. Consequently, the identification of PPIs in the model plant Arabidopsis is of great significance to deeply understand plant growth and development, and then to promote the basic research of crop improvement. Although many experimental Arabidopsis PPIs have been determined currently, the known interactomic data of Arabidopsis is far from complete. In this context, developing effective machine learning models from existing PPI data to predict unknown Arabidopsis PPIs conveniently and rapidly is still urgently needed.ResultsWe used a large-scale pre-trained protein language model (pLM) called ESM-1b to convert protein sequences into high-dimensional vectors and then used them as the input of multilayer perceptron (MLP). To avoid the performance overestimation frequently occurring in PPI prediction, we employed stringent datasets to train and evaluate the predictive model. The results showed that the combination of ESM-1b and MLP (i.e., ESMAraPPI) achieved more accurate performance than the predictive models inferred from other pLMs or baseline sequence encoding schemes. In particular, the proposed ESMAraPPI yielded an AUPR value of 0.810 when tested on an independent test set where both proteins in each protein pair are unseen in the training dataset, suggesting its strong generalization and extrapolating ability. Moreover, the proposed ESMAraPPI model performed better than several state-of-the-art generic or plant-specific PPI predictors.ConclusionProtein sequence embeddings from the pre-trained model ESM-1b contain rich protein semantic information. By combining with the MLP algorithm, ESM-1b revealed excellent performance in predicting Arabidopsis PPIs. We anticipate that the proposed predictive model (ESMAraPPI) can serve as a very competitive tool to accelerate the identification of Arabidopsis interactome.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Pre-trained protein language model sheds new light on the prediction of Arabidopsis protein–protein interactions
    Kewei Zhou
    Chenping Lei
    Jingyan Zheng
    Yan Huang
    Ziding Zhang
    Plant Methods, 19
  • [2] LPBERT: A Protein-Protein Interaction Prediction Method Based on a Pre-Trained Language Model
    Hu, An
    Kuang, Linai
    Yang, Dinghai
    APPLIED SCIENCES-BASEL, 2025, 15 (06):
  • [3] A Protein-Protein Interaction Extraction Approach Based on Large Pre-trained Language Model and Adversarial Training
    Tang, Zhan
    Guo, Xuchao
    Bai, Zhao
    Diao, Lei
    Lu, Shuhan
    Li, Lin
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (03): : 771 - 791
  • [4] Extracting Protein-Protein Interactions Affected by Mutations via Auxiliary Task and Domain Pre-trained Model
    Wang, Yu
    Zhang, Shaowu
    Zhang, Yijia
    Wang, Jian
    Lin, Hongfei
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 495 - 498
  • [5] Protein-DNA binding sites prediction based on pre-trained protein language model and contrastive learning
    Liu, Yufan
    Tian, Boxue
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (01)
  • [6] Reduced nDsbD sheds light on protein-protein interactions in disulfide cascades
    Saridakis, Emmanuel
    Mavridou, Despoina A. I.
    Kritsiligkou, Paraskevi
    Goddard, Alan D.
    Stevens, Julie M.
    Ferguson, Stuart J.
    Redfield, Christina
    ACTA CRYSTALLOGRAPHICA A-FOUNDATION AND ADVANCES, 2011, 67 : C350 - C350
  • [7] Prediction of Protein-Protein Interactions Using Vision Transformer and Language Model
    Jha, Kanchan
    Saha, Sriparna
    Karmakar, Sourav
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (05) : 3215 - 3225
  • [8] Protein-small molecule binding site prediction based on a pre-trained protein language model with contrastive learning
    Wang, Jue
    Liu, Yufan
    Tian, Boxue
    JOURNAL OF CHEMINFORMATICS, 2024, 16 (01):
  • [9] PreAlgPro: Prediction of allergenic proteins with pre-trained protein language model and efficient neutral network
    Zhang, Lingrong
    Liu, Taigang
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2024, 280
  • [10] Computational prediction of protein-protein interactions' network in Arabidopsis thaliana
    Hekmati, Zhale
    Zahiri, Javad
    Aalami, Ali
    ACTA PHYSIOLOGIAE PLANTARUM, 2023, 45 (12)