Arabidopsis;
Protein-protein interactions;
Machine learning;
Pre-trained language model;
Natural language processing;
ANNOTATION;
DATABASES;
D O I:
10.1186/s13007-023-01119-6
中图分类号:
Q5 [生物化学];
学科分类号:
071010 ;
081704 ;
摘要:
BackgroundProtein-protein interactions (PPIs) are heavily involved in many biological processes. Consequently, the identification of PPIs in the model plant Arabidopsis is of great significance to deeply understand plant growth and development, and then to promote the basic research of crop improvement. Although many experimental Arabidopsis PPIs have been determined currently, the known interactomic data of Arabidopsis is far from complete. In this context, developing effective machine learning models from existing PPI data to predict unknown Arabidopsis PPIs conveniently and rapidly is still urgently needed.ResultsWe used a large-scale pre-trained protein language model (pLM) called ESM-1b to convert protein sequences into high-dimensional vectors and then used them as the input of multilayer perceptron (MLP). To avoid the performance overestimation frequently occurring in PPI prediction, we employed stringent datasets to train and evaluate the predictive model. The results showed that the combination of ESM-1b and MLP (i.e., ESMAraPPI) achieved more accurate performance than the predictive models inferred from other pLMs or baseline sequence encoding schemes. In particular, the proposed ESMAraPPI yielded an AUPR value of 0.810 when tested on an independent test set where both proteins in each protein pair are unseen in the training dataset, suggesting its strong generalization and extrapolating ability. Moreover, the proposed ESMAraPPI model performed better than several state-of-the-art generic or plant-specific PPI predictors.ConclusionProtein sequence embeddings from the pre-trained model ESM-1b contain rich protein semantic information. By combining with the MLP algorithm, ESM-1b revealed excellent performance in predicting Arabidopsis PPIs. We anticipate that the proposed predictive model (ESMAraPPI) can serve as a very competitive tool to accelerate the identification of Arabidopsis interactome.
机构:
Westlake Univ, AI Res & Innovat Lab, Hangzhou 310030, Peoples R ChinaWestlake Univ, AI Res & Innovat Lab, Hangzhou 310030, Peoples R China
Wu, Fang
Wu, Lirong
论文数: 0引用数: 0
h-index: 0
机构:
Westlake Univ, AI Res & Innovat Lab, Hangzhou 310030, Peoples R ChinaWestlake Univ, AI Res & Innovat Lab, Hangzhou 310030, Peoples R China
Wu, Lirong
Radev, Dragomir
论文数: 0引用数: 0
h-index: 0
机构:
Yale Univ, Dept Comp Sci, New Haven, CT 06511 USAWestlake Univ, AI Res & Innovat Lab, Hangzhou 310030, Peoples R China
Radev, Dragomir
Xu, Jinbo
论文数: 0引用数: 0
h-index: 0
机构:
Tsinghua Univ, Inst AI Ind Res, Haidian St, Beijing 100084, Peoples R China
Toyota Technol Inst Chicago, Chicago, IL 60637 USAWestlake Univ, AI Res & Innovat Lab, Hangzhou 310030, Peoples R China
Xu, Jinbo
Li, Stan Z.
论文数: 0引用数: 0
h-index: 0
机构:
Westlake Univ, AI Res & Innovat Lab, Hangzhou 310030, Peoples R ChinaWestlake Univ, AI Res & Innovat Lab, Hangzhou 310030, Peoples R China
机构:
Tianjin Univ, Coll Intelligence & Comp, 135 YaGuan Rd, Tianjin, Peoples R China
Anhui Normal Univ, Wuhu City, Anhui, Peoples R ChinaTianjin Univ, Coll Intelligence & Comp, 135 YaGuan Rd, Tianjin, Peoples R China
Xu, Chang
Jiang, Limin
论文数: 0引用数: 0
h-index: 0
机构:
Tianjin Univ, Coll Intelligence & Comp, 135 YaGuan Rd, Tianjin, Peoples R ChinaTianjin Univ, Coll Intelligence & Comp, 135 YaGuan Rd, Tianjin, Peoples R China
Jiang, Limin
Zhang, Zehua
论文数: 0引用数: 0
h-index: 0
机构:
Tianjin Univ, Coll Intelligence & Comp, 135 YaGuan Rd, Tianjin, Peoples R ChinaTianjin Univ, Coll Intelligence & Comp, 135 YaGuan Rd, Tianjin, Peoples R China
Zhang, Zehua
Yu, Xuyao
论文数: 0引用数: 0
h-index: 0
机构:
Tianjin Med Univ Canc Inst & Hosp, Dept Radiotherapy, Tianjin, Peoples R ChinaTianjin Univ, Coll Intelligence & Comp, 135 YaGuan Rd, Tianjin, Peoples R China
Yu, Xuyao
Chen, Renhai
论文数: 0引用数: 0
h-index: 0
机构:
Tianjin Univ, Coll Intelligence & Comp, 135 YaGuan Rd, Tianjin, Peoples R China
Tianjin Univ, Shenzhen Res Inst, Coll Intelligence & Comp, Tianjin, Peoples R ChinaTianjin Univ, Coll Intelligence & Comp, 135 YaGuan Rd, Tianjin, Peoples R China
Chen, Renhai
Xu, Junhai
论文数: 0引用数: 0
h-index: 0
机构:
Tianjin Univ, Coll Intelligence & Comp, 135 YaGuan Rd, Tianjin, Peoples R ChinaTianjin Univ, Coll Intelligence & Comp, 135 YaGuan Rd, Tianjin, Peoples R China