A novel sequence alignment algorithm based on deep learning of the protein folding code

被引:13
|
作者
Gao, Mu [1 ]
Skolnick, Jeffrey [1 ]
机构
[1] Georgia Inst Technol, Sch Biol Sci, Ctr Study Syst Biol, Atlanta, GA 30332 USA
关键词
HOMOLOGY DETECTION; TWILIGHT ZONE; PSI-BLAST; IDENTIFICATION; RECOGNITION; TOOL;
D O I
10.1093/bioinformatics/btaa810
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: From evolutionary interference, function annotation to structural prediction, protein sequence comparison has provided crucial biological insights. While many sequence alignment algorithms have been developed, existing approaches often cannot detect hidden structural relationships in the 'twilight zone' of low sequence identity. To address this critical problem, we introduce a computational algorithm that performs protein Sequence Alignments from deep-Learning of Structural Alignments (SAdLSA, silent 'd'). The key idea is to implicitly learn the protein folding code from many thousands of structural alignments using experimentally determined protein structures. Results: To demonstrate that the folding code was learned, we first show that SAdLSA trained on pure alpha-helical proteins successfully recognizes pairs of structurally related pure beta-sheet protein domains. Subsequent training and benchmarking on larger, highly challenging datasets show significant improvement over established approaches. For challenging cases, SAdLSA is similar to 150% better than HHsearch for generating pairwise alignments and similar to 50% better for identifying the proteins with the best alignments in a sequence library. The time complexity of SAdLSA is O(N) thanks to GPU acceleration.
引用
收藏
页码:490 / 496
页数:7
相关论文
共 50 条
  • [21] Protein structural alignment using deep learning
    Wei Li
    Nature Genetics, 2023, 55 : 1609 - 1609
  • [22] Multi-source deep transfer learning algorithm based on feature alignment
    Ding, Changhong
    Gao, Peng
    Li, Jingmei
    Wu, Weifei
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (SUPPL 1) : 769 - 791
  • [23] Multi-source deep transfer learning algorithm based on feature alignment
    Changhong Ding
    Peng Gao
    Jingmei Li
    Weifei Wu
    Artificial Intelligence Review, 2023, 56 : 769 - 791
  • [24] Protein sequence design by deep learning
    Jue Wang
    Nature Computational Science, 2022, 2 : 416 - 417
  • [25] Protein sequence design by deep learning
    Wang, Jue
    NATURE COMPUTATIONAL SCIENCE, 2022, 2 (07): : 416 - 417
  • [26] Novel use of a genetic algorithm for protein structure prediction: Searching template and sequence alignment space
    Contreras-Moreira, B
    Fitzjohn, PW
    Offman, M
    Smith, GR
    Bates, PA
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2003, 53 : 424 - 429
  • [27] Novel Reconfigurable Hardware Accelerator for Protein Sequence Alignment Using Smith-Waterman Algorithm
    Ibrahim, Atef
    Elsimary, Hamed
    Aljumah, Abdullah
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2016, E99A (03) : 683 - 690
  • [28] FAST: A novel protein structure alignment algorithm
    Zhu, JH
    Weng, ZP
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 58 (03) : 618 - 627
  • [29] Sequence Alignment Using Machine Learning-Based Needleman-Wunsch Algorithm
    El-Din Rashed, Amr Ezz
    Amer, Hanan M.
    El-Seddek, Mervat
    El-Din Moustafa, Hossam
    IEEE ACCESS, 2021, 9 : 109522 - 109535
  • [30] CCDive: A Deep Dive into Code Clone Detection Using Local Sequence Alignment
    Glani, Yasir
    Ping, Luo
    Shah, Syed Asad
    Ke, Lin
    TSINGHUA SCIENCE AND TECHNOLOGY, 2025, 30 (04): : 1435 - 1456