Single-sequence protein structure prediction using a language model and deep learning

被引:0
|
作者
Ratul Chowdhury
Nazim Bouatta
Surojit Biswas
Christina Floristean
Anant Kharkar
Koushik Roy
Charlotte Rochereau
Gustaf Ahdritz
Joanna Zhang
George M. Church
Peter K. Sorger
Mohammed AlQuraishi
机构
[1] Laboratory of Systems Pharmacology,Department of Biomedical Informatics
[2] Program in Therapeutic Science,Department of Computer Science
[3] Harvard Medical School,Integrated Program in Cellular, Molecular, and Biomedical Studies
[4] Harvard Medical School,Department of Systems Biology
[5] Nabla Bio,Department of Systems Biology
[6] Inc.,undefined
[7] Columbia University,undefined
[8] Columbia University,undefined
[9] Columbia University,undefined
[10] Harvard Medical School,undefined
来源
Nature Biotechnology | 2022年 / 40卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
AlphaFold2 and related computational systems predict protein structure using deep learning and co-evolutionary relationships encoded in multiple sequence alignments (MSAs). Despite high prediction accuracy achieved by these systems, challenges remain in (1) prediction of orphan and rapidly evolving proteins for which an MSA cannot be generated; (2) rapid exploration of designed structures; and (3) understanding the rules governing spontaneous polypeptide folding in solution. Here we report development of an end-to-end differentiable recurrent geometric network (RGN) that uses a protein language model (AminoBERT) to learn latent structural information from unaligned proteins. A linked geometric module compactly represents Cα backbone geometry in a translationally and rotationally invariant way. On average, RGN2 outperforms AlphaFold2 and RoseTTAFold on orphan proteins and classes of designed proteins while achieving up to a 106-fold reduction in compute time. These findings demonstrate the practical and theoretical strengths of protein language models relative to MSAs in structure prediction.
引用
下载
收藏
页码:1617 / 1623
页数:6
相关论文
共 50 条
  • [1] Single-sequence protein structure prediction using a language model and deep learning
    Chowdhury, Ratul
    Bouatta, Nazim
    Biswas, Surojit
    Floristean, Christina
    Kharkare, Anant
    Roye, Koushik
    Rochereau, Charlotte
    Ahdritz, Gustaf
    Zhang, Joanna
    Church, George M.
    Sorger, Peter K.
    AlQuraishi, Mohammed
    NATURE BIOTECHNOLOGY, 2022, 40 (11) : 1617 - +
  • [2] Single-sequence protein structure prediction by integrating protein language models
    Jing, Xiaoyang
    Wu, Fandi
    Luo, Xiao
    Xu, Jinbo
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (13)
  • [3] Single-sequence protein structure prediction using supervised transformer protein language models
    Wang, Wenkai
    Peng, Zhenling
    Yang, Jianyi
    NATURE COMPUTATIONAL SCIENCE, 2022, 2 (12): : 804 - +
  • [4] Single-sequence protein structure prediction using supervised transformer protein language models
    Wenkai Wang
    Zhenling Peng
    Jianyi Yang
    Nature Computational Science, 2022, 2 : 804 - 814
  • [5] Protein secondary structure prediction for a single-sequence using hidden semi-Markov models
    Zafer Aydin
    Yucel Altunbasak
    Mark Borodovsky
    BMC Bioinformatics, 7
  • [6] Protein secondary structure prediction for a single-sequence using hidden semi-Markov models
    Aydin, Zafer
    Altunbasak, Yucel
    Borodovsky, Mark
    BMC BIOINFORMATICS, 2006, 7 (1)
  • [7] Accurate Single-Sequence Prediction of Protein Intrinsic Disorder by an Ensemble of Deep Recurrent and Convolutional Architectures
    Hanson, Jack
    Paliwal, Kuldip
    Zhou, Yaoqi
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2018, 58 (11) : 2369 - 2376
  • [8] Training set reduction methods for protein secondary structure prediction in single-sequence condition
    Aydin, Zafer
    Altunbasak, Yucel
    Pakatci, Isa Kemal
    Erdogan, Hakan
    2007 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-16, 2007, : 5025 - +
  • [9] PS4: a next-generation dataset for protein single-sequence secondary structure prediction
    Peracha, Omar
    BIOTECHNIQUES, 2024, 76 (02) : 63 - 70
  • [10] A Unified Deep Learning Model for Protein Structure Prediction
    Bai, Lin
    Yang, Lina
    2017 3RD IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS (CYBCONF), 2017, : 248 - 253