Pre-trained protein language model sheds new light on the prediction of Arabidopsis protein-protein interactions

被引:4
|
作者
Zhou, Kewei [1 ]
Lei, Chenping [1 ]
Zheng, Jingyan [1 ]
Huang, Yan [1 ]
Zhang, Ziding [1 ]
机构
[1] China Agr Univ, Coll Biol Sci, State Key Lab Anim Biotech Breeding, Beijing 100193, Peoples R China
基金
中国国家自然科学基金;
关键词
Arabidopsis; Protein-protein interactions; Machine learning; Pre-trained language model; Natural language processing; ANNOTATION; DATABASES;
D O I
10.1186/s13007-023-01119-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundProtein-protein interactions (PPIs) are heavily involved in many biological processes. Consequently, the identification of PPIs in the model plant Arabidopsis is of great significance to deeply understand plant growth and development, and then to promote the basic research of crop improvement. Although many experimental Arabidopsis PPIs have been determined currently, the known interactomic data of Arabidopsis is far from complete. In this context, developing effective machine learning models from existing PPI data to predict unknown Arabidopsis PPIs conveniently and rapidly is still urgently needed.ResultsWe used a large-scale pre-trained protein language model (pLM) called ESM-1b to convert protein sequences into high-dimensional vectors and then used them as the input of multilayer perceptron (MLP). To avoid the performance overestimation frequently occurring in PPI prediction, we employed stringent datasets to train and evaluate the predictive model. The results showed that the combination of ESM-1b and MLP (i.e., ESMAraPPI) achieved more accurate performance than the predictive models inferred from other pLMs or baseline sequence encoding schemes. In particular, the proposed ESMAraPPI yielded an AUPR value of 0.810 when tested on an independent test set where both proteins in each protein pair are unseen in the training dataset, suggesting its strong generalization and extrapolating ability. Moreover, the proposed ESMAraPPI model performed better than several state-of-the-art generic or plant-specific PPI predictors.ConclusionProtein sequence embeddings from the pre-trained model ESM-1b contain rich protein semantic information. By combining with the MLP algorithm, ESM-1b revealed excellent performance in predicting Arabidopsis PPIs. We anticipate that the proposed predictive model (ESMAraPPI) can serve as a very competitive tool to accelerate the identification of Arabidopsis interactome.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Integration of pre-trained protein language models into geometric deep learning networks
    Fang Wu
    Lirong Wu
    Dragomir Radev
    Jinbo Xu
    Stan Z. Li
    Communications Biology, 6
  • [42] Integration of pre-trained protein language models into geometric deep learning networks
    Wu, Fang
    Wu, Lirong
    Radev, Dragomir
    Xu, Jinbo
    Li, Stan Z.
    COMMUNICATIONS BIOLOGY, 2023, 6 (01)
  • [43] Prediction of Protein-Protein Interactions Based on Protein-Protein Correlation Using Least Squares Regression
    Huang, De-Shuang
    Zhang, Lei
    Han, Kyungsook
    Deng, Suping
    Yang, Kai
    Zhang, Hongbo
    CURRENT PROTEIN & PEPTIDE SCIENCE, 2014, 15 (06) : 553 - 560
  • [44] LMNglyPred: prediction of human N-linked glycosylation sites using embeddings from a pre-trained protein language model
    Pakhrin, Subash C.
    Pokharel, Suresh
    Aoki-Kinoshita, Kiyoko F.
    Beck, Moriah R.
    Dam, Tarun K.
    Caragea, Doina
    Kc, Dukka B.
    GLYCOBIOLOGY, 2023, 33 (05) : 411 - 422
  • [45] Arabidopsis Protein Microarrays for the High-Throughput Identification of Protein-Protein Interactions
    Popescu, Sorina C.
    Snyder, Michael
    Dinesh-Kumar, S. P.
    PLANT SIGNALING & BEHAVIOR, 2007, 2 (05) : 416 - 420
  • [46] PepNet: an interpretable neural network for anti-inflammatory and antimicrobial peptides prediction using a pre-trained protein language model
    Han, Jiyun
    Kong, Tongxin
    Liu, Juntao
    COMMUNICATIONS BIOLOGY, 2024, 7 (01)
  • [47] Prediction of protein-protein and protein-ligand interactions from protein structures
    Jones, D
    Sodhi, J
    Lise, S
    McGuffin, L
    Bryson, K
    FEBS JOURNAL, 2005, 272 : 397 - 398
  • [48] Probabilistic prediction and ranking of human protein-protein interactions
    Scott, Michelle S.
    Barton, Geoffrey J.
    BMC BIOINFORMATICS, 2007, 8 (1)
  • [49] An Integrated Prediction Method for Identifying Protein-Protein Interactions
    Xu, Chang
    Jiang, Limin
    Zhang, Zehua
    Yu, Xuyao
    Chen, Renhai
    Xu, Junhai
    CURRENT PROTEOMICS, 2020, 17 (04) : 271 - 286
  • [50] Probabilistic prediction and ranking of human protein-protein interactions
    Michelle S Scott
    Geoffrey J Barton
    BMC Bioinformatics, 8