Exploring evolution-aware & -free protein language models as protein function predictors

被引:0
|
作者
Hu, Mingyang [1 ]
Yuan, Fajie [1 ]
Yang, Kevin K. [2 ]
Ju, Fusong [3 ]
Su, Jin [1 ]
Wang, Hui [1 ]
Yang, Fei [4 ]
Ding, Qiuyang [1 ]
机构
[1] Westlake Univ, Hangzhou, Peoples R China
[2] Microsoft Res New England, Cambridge, MA USA
[3] Microsoft Res Asia, Beijing, Peoples R China
[4] Zhejiang Lab, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale Protein Language Models (PLMs) have improved performance in protein prediction tasks, ranging from 3D structure prediction to various function predictions. In particular, AlphaFold [23], a ground-breaking AI system, could potentially reshape structural biology. However, the utility of the PLM module in AlphaFold, Evoformer, has not been explored beyond structure prediction. In this paper, we investigate the representation ability of three popular PLMs: ESM-1b (single sequence) [35], MSA-Transformer (multiple sequence alignment) [30] and Evoformer (structural), with a special focus on Evoformer. Specifically, we aim to answer the following key questions: (i) Does the Evoformer trained as part of AlphaFold produce representations amenable to predicting protein function? (ii) If yes, can Evoformer replace ESM-1b and MSA-Transformer? (iii) How much do these PLMs rely on evolution-related protein data? In this regard, are they complementary to each other? We compare these models by empirical study along with new insights and conclusions. All code and datasets for reproducibility are available at https://github.com/elttaes/Revisiting-PLMs.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Efficient evolution of human antibodies from general protein language models
    Brian L. Hie
    Varun R. Shanker
    Duo Xu
    Theodora U. J. Bruun
    Payton A. Weidenbacher
    Shaogeng Tang
    Wesley Wu
    John E. Pak
    Peter S. Kim
    Nature Biotechnology, 2024, 42 : 275 - 283
  • [22] Protein language models are performant in structure-free virtual screening
    Lam, Hilbert Yuen In
    Guan, Jia Sheng
    Ong, Xing Er
    Pincket, Robbe
    Mu, Yuguang
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (06)
  • [23] Tactile teaching - Exploring protein structure/function using physical models
    Herman, Tim
    Morris, Jennifer
    Colton, Shannon
    Batiza, Ann
    Patrick, Michael
    Franzen, Margaret
    Goodsell, David S.
    BIOCHEMISTRY AND MOLECULAR BIOLOGY EDUCATION, 2006, 34 (04) : 247 - 254
  • [24] Language models for protein design
    Lee, Jin Sub
    Abdin, Osama
    Kim, Philip M.
    CURRENT OPINION IN STRUCTURAL BIOLOGY, 2025, 92
  • [25] A sweeter future: Using protein language models for exploring sweeter brazzein homologs
    Chua, Bryan Nicholas
    Guo, Wei Mei
    Wong, Han Teng
    Ow, Dave Siak-Wei
    Ho, Pooi Leng
    Koh, Winston
    Koay, Ann
    Wong, Fong Tian
    FOOD CHEMISTRY, 2023, 426
  • [26] Simple, Efficient, and Scalable Structure-Aware Adapter Boosts Protein Language Models
    Tan, Yang
    Li, Mingchen
    Zhou, Bingxin
    Zhong, Bozitao
    Zheng, Lirong
    Tan, Pan
    Zhou, Ziyi
    Yu, Huiqun
    Fan, Guisheng
    Hong, Liang
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (16) : 6338 - 6349
  • [27] Evolution of protein structure and function
    Sternberg, Michael
    Ali, Syed Nabil
    Helmer-Citterich, Manuela
    Gherardini, Pier F.
    Fleming, Keiran
    Kelley, Lawrence A.
    Wass, Mark N.
    COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY A-MOLECULAR & INTEGRATIVE PHYSIOLOGY, 2009, 153A (02): : S47 - S47
  • [28] Exploring large language models for microstructure evolution in materials
    Satpute, Prathamesh
    Tiwari, Saurabh
    Gupta, Maneet
    Ghosh, Supriyo
    MATERIALS TODAY COMMUNICATIONS, 2024, 40
  • [29] Protein function in the evolution of yeast protein interaction network
    Dong, L.
    Yunping, Z.
    Fuchu, H.
    MOLECULAR & CELLULAR PROTEOMICS, 2005, 4 (08) : S59 - S59
  • [30] Collectively encoding protein properties enriches protein language models
    An, Jingmin
    Weng, Xiaogang
    BMC BIOINFORMATICS, 2022, 23 (01)