A novel sequence alignment algorithm based on deep learning of the protein folding code

被引:13
|
作者
Gao, Mu [1 ]
Skolnick, Jeffrey [1 ]
机构
[1] Georgia Inst Technol, Sch Biol Sci, Ctr Study Syst Biol, Atlanta, GA 30332 USA
关键词
HOMOLOGY DETECTION; TWILIGHT ZONE; PSI-BLAST; IDENTIFICATION; RECOGNITION; TOOL;
D O I
10.1093/bioinformatics/btaa810
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: From evolutionary interference, function annotation to structural prediction, protein sequence comparison has provided crucial biological insights. While many sequence alignment algorithms have been developed, existing approaches often cannot detect hidden structural relationships in the 'twilight zone' of low sequence identity. To address this critical problem, we introduce a computational algorithm that performs protein Sequence Alignments from deep-Learning of Structural Alignments (SAdLSA, silent 'd'). The key idea is to implicitly learn the protein folding code from many thousands of structural alignments using experimentally determined protein structures. Results: To demonstrate that the folding code was learned, we first show that SAdLSA trained on pure alpha-helical proteins successfully recognizes pairs of structurally related pure beta-sheet protein domains. Subsequent training and benchmarking on larger, highly challenging datasets show significant improvement over established approaches. For challenging cases, SAdLSA is similar to 150% better than HHsearch for generating pairwise alignments and similar to 50% better for identifying the proteins with the best alignments in a sequence library. The time complexity of SAdLSA is O(N) thanks to GPU acceleration.
引用
收藏
页码:490 / 496
页数:7
相关论文
共 50 条
  • [1] Pairwise heuristic sequence alignment algorithm based on deep reinforcement learning
    Song Y.-J.
    Ji D.J.
    Seo H.
    Han G.-B.
    Cho D.-H.
    IEEE Open Journal of Engineering in Medicine and Biology, 2021, 2 : 36 - 43
  • [2] Local Alignment of DNA Sequence Based on Deep Reinforcement Learning
    Song, Yong-Joon
    Cho, Dong-Ho
    IEEE OPEN JOURNAL OF ENGINEERING IN MEDICINE AND BIOLOGY, 2021, 2 : 170 - 178
  • [3] Sequence-based prediction of protein protein interaction using a deep-learning algorithm
    Sun, Tanlin
    Zhou, Bo
    Lai, Luhua
    Pei, Jianfeng
    BMC BIOINFORMATICS, 2017, 18
  • [4] Sequence-based prediction of protein protein interaction using a deep-learning algorithm
    Tanlin Sun
    Bo Zhou
    Luhua Lai
    Jianfeng Pei
    BMC Bioinformatics, 18
  • [5] Protein sequence alignment based on fuzzy arithmetic and Genetic algorithm
    Chang, Ping-Teng
    Hung, Lung-Ting
    Lin, Kuo-Ping
    Lin, Chih-Sheng
    Hung, Kuo-Chen
    2006 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2006, : 1362 - +
  • [6] Distance-based protein folding powered by deep learning
    Xu, Jinbo
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (34) : 16856 - 16865
  • [9] A novel DNA multiple sequence alignment algorithm based on genetic algorithm and simulated annealing
    Gong, Dao-Xiong
    Ruan, Xiao-Gang
    Chinese Journal of Biomedical Engineering, 2004, 23 (01) : 73 - 78
  • [10] A Deep Reinforcement Learning Floorplanning Algorithm Based on Sequence Pairs
    Yu, Shenglu
    Du, Shimin
    Yang, Chang
    APPLIED SCIENCES-BASEL, 2024, 14 (07):