Protein embedding based alignment

被引:4
|
作者
Iovino, Benjamin Giovanni [1 ]
Ye, Yuzhen [1 ]
机构
[1] Indiana Univ, Luddy Sch Informat Comp & Engn, 700 N Woodlawn Ave, Bloomington, IN 47408 USA
关键词
Protein embedding; Protein sequence alignment; Smith-Waterman algorithm; Twilight zone; SEQUENCE; LANGUAGE; SEARCH;
D O I
10.1186/s12859-024-05699-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Purpose: Despite the many progresses with alignment algorithms, aligning divergent protein sequences with less than 20-35% pairwise identity (so called "twilight zone") remains a difficult problem. Many alignment algorithms have been using substitution matrices since their creation in the 1970's to generate alignments, however, these matrices do not work well to score alignments within the twilight zone. We developed Protein Embedding based Alignments, or PEbA, to better align sequences with low pairwise identity. Similar to the traditional Smith-Waterman algorithm, PEbA uses a dynamic programming algorithm but the matching score of amino acids is based on the similarity of their embeddings from a protein language model. Methods: We tested PEbA on over twelve thousand benchmark pairwise alignments from BAliBASE, each one extracted from one of their multiple sequence alignments. Five different BAliBASE references were used, each with different sequence identities, motifs, and lengths, allowing PEbA to showcase how well it aligns under different circumstances. Results: PEbA greatly outperformed BLOSUM substitution matrix-based pairwise alignments, achieving different levels of improvements of the alignment quality for pairs of sequences with different levels of similarity (over four times as well for pairs of sequences with <10% identity). We also compared PEbA with embeddings generated by different protein language models (ProtT5 and ESM-2) and found that ProtT5-XL-U50 produced the most useful embeddings for aligning protein sequences. PEbA also outperformed DEDAL and vcMSA, two recently developed protein language model embedding-based alignment methods. Conclusion: Our results suggested that general purpose protein language models provide useful contextual information for generating more accurate protein alignments than typically used methods.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Joint Graph Embedding and Alignment with Spectral Pivot
    Karakasis, Paris A.
    Konar, Aritra
    Sidiropoulos, Nicholas D.
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 851 - 859
  • [42] Hierarchical Mapping for Crosslingual Word Embedding Alignment
    Azpiazu, Ion Madrazo
    Pera, Maria Soledad
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 (08) : 361 - 376
  • [43] Optimizing the Accuracy of Randomized Embedding for Sequence Alignment
    Yan, Yiqing
    Chaturvedi, Nimisha
    Appuswamy, Raja
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 144 - 151
  • [44] Entity alignment based on informative neighbor sampling and multi-embedding graph matching
    Chunmei Liu
    Yongbin Gao
    Zhijun Fang
    Multimedia Tools and Applications, 2024, 83 : 34269 - 34289
  • [45] Word Embedding-based Method for Entity Category Alignment of Geographic Knowledge Base
    Xu Z.
    Zhu Y.
    Song J.
    Sun K.
    Wang S.
    Zhu, Yunqiang (zhuyq@igsnrr.ac.cn); Zhu, Yunqiang (zhuyq@igsnrr.ac.cn), 1600, Science Press (23): : 1372 - 1381
  • [46] Embedding-Based Entity Alignment of Cross-Lingual Temporal Knowledge Graphs
    Bai, Luyi
    Li, Nan
    Li, Guishun
    Zhang, Ziyi
    Zhu, Lin
    NEURAL NETWORKS, 2024, 172
  • [47] Orbital Alignment for Accurate Projection-Based Embedding Calculations along Reaction Paths
    Bensberg, Moritz
    Neugebauer, Johannes
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2020, 16 (06) : 3607 - 3619
  • [48] A Novel Embedding Model for Knowledge Graph Entity Alignment Based on Graph Neural Networks
    Li, Hongchan
    Han, Zhaoyang
    Zhu, Haodong
    Qian, Yuchao
    APPLIED SCIENCES-BASEL, 2023, 13 (10):
  • [49] Multi-Embedding Representation Entity Alignment Method Based on Image Fusion Information
    Liu, Chunmei
    Gao, Yongbin
    Yu, Wenjun
    Computer Engineering and Applications, 2024, 60 (15) : 111 - 121
  • [50] Entity alignment based on informative neighbor sampling and multi-embedding graph matching
    Liu, Chunmei
    Gao, Yongbin
    Fang, Zhijun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (12) : 34269 - 34289