Integration of pre-trained protein language models into geometric deep learning networks

被引:10
|
作者
Wu, Fang [1 ]
Wu, Lirong [1 ]
Radev, Dragomir [2 ]
Xu, Jinbo [3 ,4 ]
Li, Stan Z. [1 ]
机构
[1] Westlake Univ, AI Res & Innovat Lab, Hangzhou 310030, Peoples R China
[2] Yale Univ, Dept Comp Sci, New Haven, CT 06511 USA
[3] Tsinghua Univ, Inst AI Ind Res, Haidian St, Beijing 100084, Peoples R China
[4] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
关键词
PREDICTION; COLLECTION; BENCHMARK;
D O I
10.1038/s42003-023-05133-1
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Geometric deep learning has recently achieved great success in non-Euclidean domains, and learning on 3D structures of large biomolecules is emerging as a distinct research area. However, its efficacy is largely constrained due to the limited quantity of structural data. Meanwhile, protein language models trained on substantial 1D sequences have shown burgeoning capabilities with scale in a broad range of applications. Several preceding studies consider combining these different protein modalities to promote the representation power of geometric neural networks but fail to present a comprehensive understanding of their benefits. In this work, we integrate the knowledge learned by well-trained protein language models into several state-of-the-art geometric networks and evaluate a variety of protein representation learning benchmarks, including protein-protein interface prediction, model quality assessment, protein-protein rigid-body docking, and binding affinity prediction. Our findings show an overall improvement of 20% over baselines. Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin and can be generalized to complex tasks.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Integration of pre-trained protein language models into geometric deep learning networks
    Fang Wu
    Lirong Wu
    Dragomir Radev
    Jinbo Xu
    Stan Z. Li
    Communications Biology, 6
  • [2] Deep Entity Matching with Pre-Trained Language Models
    Li, Yuliang
    Li, Jinfeng
    Suhara, Yoshihiko
    Doan, AnHai
    Tan, Wang-Chiew
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (01): : 50 - 60
  • [3] Kurdish Sign Language Recognition Using Pre-Trained Deep Learning Models
    Alsaud, Ali A.
    Yousif, Raghad Z.
    Aziz, Marwan. M.
    Kareem, Shahab W.
    Maho, Amer J.
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (06) : 1334 - 1344
  • [4] LMPred: predicting antimicrobial peptides using pre-trained language models and deep learning
    Dee, William
    Gromiha, Michael
    BIOINFORMATICS ADVANCES, 2022, 2 (01):
  • [5] Pre-Trained Language Models and Their Applications
    Wang, Haifeng
    Li, Jiwei
    Wu, Hua
    Hovy, Eduard
    Sun, Yu
    ENGINEERING, 2023, 25 : 51 - 65
  • [6] Meta Distant Transfer Learning for Pre-trained Language Models
    Wang, Chengyu
    Pan, Haojie
    Qiu, Minghui
    Yang, Fei
    Huang, Jun
    Zhang, Yin
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9742 - 9752
  • [7] Annotating Columns with Pre-trained Language Models
    Suhara, Yoshihiko
    Li, Jinfeng
    Li, Yuliang
    Zhang, Dan
    Demiralp, Cagatay
    Chen, Chen
    Tan, Wang-Chiew
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1493 - 1503
  • [8] LaoPLM: Pre-trained Language Models for Lao
    Lin, Nankai
    Fu, Yingwen
    Yang, Ziyu
    Chen, Chuwei
    Jiang, Shengyi
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6506 - 6512
  • [9] PhoBERT: Pre-trained language models for Vietnamese
    Dat Quoc Nguyen
    Anh Tuan Nguyen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1037 - 1042
  • [10] Deciphering Stereotypes in Pre-Trained Language Models
    Ma, Weicheng
    Scheible, Henry
    Wang, Brian
    Veeramachaneni, Goutham
    Chowdhary, Pratim
    Sung, Alan
    Koulogeorge, Andrew
    Wang, Lili
    Yang, Diyi
    Vosoughi, Soroush
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 11328 - 11345