Pre-trained protein language model sheds new light on the prediction of Arabidopsis protein-protein interactions

被引:4
|
作者
Zhou, Kewei [1 ]
Lei, Chenping [1 ]
Zheng, Jingyan [1 ]
Huang, Yan [1 ]
Zhang, Ziding [1 ]
机构
[1] China Agr Univ, Coll Biol Sci, State Key Lab Anim Biotech Breeding, Beijing 100193, Peoples R China
基金
中国国家自然科学基金;
关键词
Arabidopsis; Protein-protein interactions; Machine learning; Pre-trained language model; Natural language processing; ANNOTATION; DATABASES;
D O I
10.1186/s13007-023-01119-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundProtein-protein interactions (PPIs) are heavily involved in many biological processes. Consequently, the identification of PPIs in the model plant Arabidopsis is of great significance to deeply understand plant growth and development, and then to promote the basic research of crop improvement. Although many experimental Arabidopsis PPIs have been determined currently, the known interactomic data of Arabidopsis is far from complete. In this context, developing effective machine learning models from existing PPI data to predict unknown Arabidopsis PPIs conveniently and rapidly is still urgently needed.ResultsWe used a large-scale pre-trained protein language model (pLM) called ESM-1b to convert protein sequences into high-dimensional vectors and then used them as the input of multilayer perceptron (MLP). To avoid the performance overestimation frequently occurring in PPI prediction, we employed stringent datasets to train and evaluate the predictive model. The results showed that the combination of ESM-1b and MLP (i.e., ESMAraPPI) achieved more accurate performance than the predictive models inferred from other pLMs or baseline sequence encoding schemes. In particular, the proposed ESMAraPPI yielded an AUPR value of 0.810 when tested on an independent test set where both proteins in each protein pair are unseen in the training dataset, suggesting its strong generalization and extrapolating ability. Moreover, the proposed ESMAraPPI model performed better than several state-of-the-art generic or plant-specific PPI predictors.ConclusionProtein sequence embeddings from the pre-trained model ESM-1b contain rich protein semantic information. By combining with the MLP algorithm, ESM-1b revealed excellent performance in predicting Arabidopsis PPIs. We anticipate that the proposed predictive model (ESMAraPPI) can serve as a very competitive tool to accelerate the identification of Arabidopsis interactome.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Porter 6: Protein Secondary Structure Prediction by Leveraging Pre-Trained Language Models (PLMs)
    Alanazi, Wafa
    Meng, Di
    Pollastri, Gianluca
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2025, 26 (01)
  • [22] Improving protein-protein interaction prediction using protein language model and protein network features
    Hu, Jun
    Li, Zhe
    Rao, Bing
    Thafar, Maha A.
    Arif, Muhammad
    ANALYTICAL BIOCHEMISTRY, 2024, 693
  • [23] A High Efficient Biological Language Model for Predicting Protein-Protein Interactions
    Wang, Yanbin
    You, Zhu-Hong
    Yang, Shan
    Li, Xiao
    Jiang, Tong-Hai
    Zhou, Xi
    CELLS, 2019, 8 (02)
  • [24] Preface - Protein-protein interactions: principles and prediction
    Nussinov, R
    Tsai, CJ
    PHYSICAL BIOLOGY, 2005, 2 (02)
  • [25] Prediction of Protein-Protein Interactions Based on Domain
    Li, Xue
    Yang, Lifeng
    Zhang, Xiaopan
    Jiao, Xiong
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2019, 2019
  • [26] Prediction of Protein-Protein Interactions at Genome Scale
    Tuncbag, Nurcan
    Gursoy, Attila
    Nussinov, Ruth
    Keskin, Ozlem
    BIOPHYSICAL JOURNAL, 2011, 100 (03) : 386 - 386
  • [27] Computational Methods for the Prediction of Protein-Protein Interactions
    Xia, Jun-Feng
    Wang, Shu-Lin
    Lei, Ying-Ke
    PROTEIN AND PEPTIDE LETTERS, 2010, 17 (09): : 1069 - 1078
  • [28] Protein-Protein Interactions and Prediction: A Comprehensive Overview
    Sowmya, Gopichandran
    Ranganathan, Shoba
    PROTEIN AND PEPTIDE LETTERS, 2014, 21 (08): : 779 - 789
  • [29] Prediction of protein-protein interactions by docking methods
    Smith, GR
    Sternberg, MJE
    CURRENT OPINION IN STRUCTURAL BIOLOGY, 2002, 12 (01) : 28 - 35
  • [30] Computational Methods for the Prediction of Protein-Protein Interactions
    Guerra, Concettina
    Mina, Marco
    COMBINATORIAL IMAGE ANALYSIS, 2011, 6636 : 13 - 16