Exploring evolution-aware & -free protein language models as protein function predictors

被引:0
|
作者
Hu, Mingyang [1 ]
Yuan, Fajie [1 ]
Yang, Kevin K. [2 ]
Ju, Fusong [3 ]
Su, Jin [1 ]
Wang, Hui [1 ]
Yang, Fei [4 ]
Ding, Qiuyang [1 ]
机构
[1] Westlake Univ, Hangzhou, Peoples R China
[2] Microsoft Res New England, Cambridge, MA USA
[3] Microsoft Res Asia, Beijing, Peoples R China
[4] Zhejiang Lab, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale Protein Language Models (PLMs) have improved performance in protein prediction tasks, ranging from 3D structure prediction to various function predictions. In particular, AlphaFold [23], a ground-breaking AI system, could potentially reshape structural biology. However, the utility of the PLM module in AlphaFold, Evoformer, has not been explored beyond structure prediction. In this paper, we investigate the representation ability of three popular PLMs: ESM-1b (single sequence) [35], MSA-Transformer (multiple sequence alignment) [30] and Evoformer (structural), with a special focus on Evoformer. Specifically, we aim to answer the following key questions: (i) Does the Evoformer trained as part of AlphaFold produce representations amenable to predicting protein function? (ii) If yes, can Evoformer replace ESM-1b and MSA-Transformer? (iii) How much do these PLMs rely on evolution-related protein data? In this regard, are they complementary to each other? We compare these models by empirical study along with new insights and conclusions. All code and datasets for reproducibility are available at https://github.com/elttaes/Revisiting-PLMs.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Collectively encoding protein properties enriches protein language models
    Jingmin An
    Xiaogang Weng
    BMC Bioinformatics, 23
  • [32] Protein language models can capture protein quaternary state
    Avraham O.
    Tsaban T.
    Ben-Aharon Z.
    Tsaban L.
    Schueler-Furman O.
    BMC Bioinformatics, 2023, 24 (01)
  • [33] Improved the heterodimer protein complex prediction with protein language models
    Chen, Bo
    Xie, Ziwei
    Qiu, Jiezhong
    Ye, Zhaofeng
    Xu, Jinbo
    Tang, Jie
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (04)
  • [34] Exploring the Allosteric Territory of Protein Function
    Tee, Wei-Ven
    Tan, Zhen Wah
    Lee, Keene
    Guarnera, Enrico
    Berezovsky, Igor N.
    JOURNAL OF PHYSICAL CHEMISTRY B, 2021, 125 (15): : 3763 - 3780
  • [35] Exploring protein sequence–function landscapes
    Tyler N Starr
    Joseph W Thornton
    Nature Biotechnology, 2017, 35 : 125 - 126
  • [36] Controllable protein design with language models
    Noelia Ferruz
    Birte Höcker
    Nature Machine Intelligence, 2022, 4 : 521 - 532
  • [37] Protein language models using convolutions
    Tang, Lin
    NATURE METHODS, 2024, 21 (04) : 550 - 550
  • [38] Controllable protein design with language models
    Ferruz, Noelia
    Hoecker, Birte
    NATURE MACHINE INTELLIGENCE, 2022, 4 (06) : 521 - 532
  • [39] Protein evolution: Tracing the evolution of structure and function
    Mullard, Asher
    NATURE REVIEWS MOLECULAR CELL BIOLOGY, 2007, 8 (10) : 755 - 755
  • [40] Exploring protein fitness landscapes by directed evolution
    Romero, Philip A.
    Arnold, Frances H.
    NATURE REVIEWS MOLECULAR CELL BIOLOGY, 2009, 10 (12) : 866 - 876