Exploring evolution-aware & -free protein language models as protein function predictors

被引:0
|
作者
Hu, Mingyang [1 ]
Yuan, Fajie [1 ]
Yang, Kevin K. [2 ]
Ju, Fusong [3 ]
Su, Jin [1 ]
Wang, Hui [1 ]
Yang, Fei [4 ]
Ding, Qiuyang [1 ]
机构
[1] Westlake Univ, Hangzhou, Peoples R China
[2] Microsoft Res New England, Cambridge, MA USA
[3] Microsoft Res Asia, Beijing, Peoples R China
[4] Zhejiang Lab, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale Protein Language Models (PLMs) have improved performance in protein prediction tasks, ranging from 3D structure prediction to various function predictions. In particular, AlphaFold [23], a ground-breaking AI system, could potentially reshape structural biology. However, the utility of the PLM module in AlphaFold, Evoformer, has not been explored beyond structure prediction. In this paper, we investigate the representation ability of three popular PLMs: ESM-1b (single sequence) [35], MSA-Transformer (multiple sequence alignment) [30] and Evoformer (structural), with a special focus on Evoformer. Specifically, we aim to answer the following key questions: (i) Does the Evoformer trained as part of AlphaFold produce representations amenable to predicting protein function? (ii) If yes, can Evoformer replace ESM-1b and MSA-Transformer? (iii) How much do these PLMs rely on evolution-related protein data? In this regard, are they complementary to each other? We compare these models by empirical study along with new insights and conclusions. All code and datasets for reproducibility are available at https://github.com/elttaes/Revisiting-PLMs.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Evolution Strategies for Exploring Protein Energy Landscapes
    Clausen, Rudy
    Sapin, Emmanuel
    De Jong, Kenneth
    Shehu, Amarda
    GECCO'15: PROCEEDINGS OF THE 2015 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2015, : 217 - 224
  • [42] Exploring protein fitness landscapes by directed evolution
    Philip A. Romero
    Frances H. Arnold
    Nature Reviews Molecular Cell Biology, 2009, 10 : 866 - 876
  • [43] Exploring Machine Learning Algorithms and Protein Language Models Strategies to Develop Enzyme Classification Systems
    Fernandez, Diego
    Olivera-Nappa, Alvaro
    Uribe-Paredes, Roberto
    Medina-Ortiz, David
    BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2023, PT I, 2023, 13919 : 307 - 319
  • [44] ProCeSa: Contrast-Enhanced Structure-Aware Network for Thermostability Prediction with Protein Language Models
    Zhou, Feixiang
    Zhang, Shuo
    Zhang, Huifeng
    Liu, Jian K.
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2025, 65 (05) : 2304 - 2313
  • [45] Models of protein sequence evolution and their applications
    Thorne, JL
    CURRENT OPINION IN GENETICS & DEVELOPMENT, 2000, 10 (06) : 602 - 605
  • [46] Exploring protein function: The convergence of structure based models and co-evolutionary information
    Onuchic, Jose
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 253
  • [47] Accurate Selection of Models of Protein Evolution
    Patricio, Mateus
    Abascal, Federico
    Zardoya, Rafael
    Posada, David
    ADVANCES IN BIOINFORMATICS, 2010, 74 : 117 - +
  • [48] Exploring allostery from protein folding to function using coarse-grained models
    Brooks, Charles
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 253
  • [49] HaloClass: Salt-Tolerant Protein Classification with Protein Language Models
    Narang, Kush
    Nath, Abhigyan
    Hemstrom, William
    Chu, Simon K. S.
    PROTEIN JOURNAL, 2024, 43 (06): : 1035 - 1044
  • [50] Exploring protein folding with detailed simulation models
    Brooks, CL
    PROTEIN SCIENCE, 2004, 13 : 55 - 55