A method for multiple-sequence-alignment-free protein structure prediction using a protein language model

被引:36
|
作者
Fang, Xiaomin [1 ]
Wang, Fan [1 ]
Liu, Lihang [1 ]
He, Jingzhou [1 ]
Lin, Dayong [1 ]
Xiang, Yingfei [1 ]
Zhu, Kunrui [1 ]
Zhang, Xiaonan [1 ]
Wu, Hua [1 ]
Li, Hui [2 ]
Song, Le [2 ]
机构
[1] Baidu Inc, NLP, Shenzhen, Peoples R China
[2] BioMap, Beijing, Peoples R China
关键词
34;
D O I
10.1038/s42256-023-00721-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Protein structure prediction pipelines based on artificial intelligence, such as AlphaFold2, have achieved near-experimental accuracy. These advanced pipelines mainly rely on multiple sequence alignments (MSAs) as inputs to learn the co-evolution information from the homologous sequences. Nonetheless, searching MSAs from protein databases is time consuming, usually taking tens of minutes. Consequently, we attempt to explore the limits of fast protein structure prediction by using only primary structures of proteins. Our proposed method, HelixFold-Single, combines a large-scale protein language model with the superior geometric learning capability of AlphaFold2. HelixFold-Single first pre-trains a large-scale protein language model with thousands of millions of primary structures utilizing the self-supervised learning paradigm, which will be used as an alternative to MSAs for learning the co-evolution information. Then, by combining the pre-trained protein language model and the essential components of AlphaFold2, we obtain an end-to-end differentiable model to predict the three-dimensional coordinates of atoms from only the primary structure. HelixFold-Single is validated on datasets CASP14 and CAMEO, achieving competitive accuracy with the MSA-based methods on targets with large homologous families. Furthermore, HelixFold-Single consumes much less time than the mainstream pipelines for protein structure prediction, demonstrating its potential in tasks requiring many predictions. AlphaFold2 has revolutionized bioinformatics, but its ability to predict protein structures with high accuracy comes at the price of a costly database search for multiple sequence alignments. Fang and colleagues pre-train a large-scale protein language model and use it in conjunction with AlphaFold2 as a fully trainable and efficient model for structure prediction.
引用
收藏
页码:1087 / 1096
页数:10
相关论文
共 50 条
  • [41] CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction
    Fusong Ju
    Jianwei Zhu
    Bin Shao
    Lupeng Kong
    Tie-Yan Liu
    Wei-Mou Zheng
    Dongbo Bu
    Nature Communications, 12
  • [42] TemStaPro: protein thermostability prediction using sequence representations from protein language models
    Pudziuvelyte, Ieva
    Olechnovic, Kliment
    Godliauskaite, Egle
    Sermokas, Kristupas
    Urbaitis, Tomas
    Gasiunas, Giedrius
    Kazlauskas, Darius
    BIOINFORMATICS, 2024, 40 (04)
  • [43] Bayesian Multiple Protein Structure Alignment
    Wang, Rui
    Schmidler, Scott C.
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB2014, 2014, 8394 : 326 - 339
  • [44] MULTIPLE PROTEIN-STRUCTURE ALIGNMENT
    TAYLOR, WR
    FLORES, TP
    ORENGO, CA
    PROTEIN SCIENCE, 1994, 3 (10) : 1858 - 1870
  • [45] Simplification of protein sequence and alignment-free sequence analysis
    Li Jing
    Li Feng-Bo
    Wang Wei
    PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS, 2006, 33 (12) : 1215 - 1222
  • [46] Identification of functional residues and secondary structure from protein multiple sequence alignment
    Livingstone, CD
    Barton, GJ
    COMPUTER METHODS FOR MACROMOLECULAR SEQUENCE ANALYSIS, 1996, 266 : 497 - 512
  • [47] Mustguseal: a server for multiple structure-guided sequence alignment of protein families
    Suplatov, Dmitry A.
    Kopylov, Kirill E.
    Popova, Nina N.
    Voevodin, Vladimir V.
    Svedas, Vytas K.
    BIOINFORMATICS, 2018, 34 (09) : 1583 - 1585
  • [48] New method for the prediction of protein structure from sequence.
    Ortiz, AR
    Kolinski, A
    Skolnick, J
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1998, 216 : U628 - U628
  • [49] iPBA: a tool for protein structure comparison using sequence alignment strategies
    Gelly, Jean-Christophe
    Joseph, Agnel Praveen
    Srinivasan, Narayanaswamy
    de Brevern, Alexandre G.
    NUCLEIC ACIDS RESEARCH, 2011, 39 : W18 - W23
  • [50] A model of evolution and structure for multiple sequence alignment
    Loeytynoja, Ari
    Goldman, Nick
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2008, 363 (1512) : 3913 - 3919