Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding

被引:19
|
作者
Melvin, Iain [1 ]
Weston, Jason [2 ]
Noble, William Stafford [3 ]
Leslie, Christina [4 ]
机构
[1] NEC Labs Amer, Princeton, NJ USA
[2] Google, New York, NY USA
[3] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[4] Mem Sloan Kettering Canc Ctr, Computat Biol Program, New York, NY 10021 USA
来源
PLOS COMPUTATIONAL BIOLOGY | 2011年 / 7卷 / 01期
关键词
HOMOLOGY DETECTION; DATABASE; SERVER;
D O I
10.1371/journal.pcbi.1001047
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods-i.e., measures of similarity between query and target sequences-provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e. g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called PROTEMBED, which learns an embedding of protein sequences into a low-dimensional "semantic space." Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that PROTEMBED achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RANKPROP algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the PROTEMBED embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Evolutionary compact embedding for large-scale image classification
    Liu, Li
    Shao, Ling
    Li, Xuelong
    [J]. INFORMATION SCIENCES, 2015, 316 : 567 - 581
  • [2] Adaptive Word Embedding Module for Semantic Reasoning in Large-scale Detection
    Zhang, Yu
    Wu, Xiaoyu
    Zhu, Ruolin
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2103 - 2109
  • [3] Detecting interpersonal relationships in large-scale railway trip data
    Asatani K.
    Toriumi F.
    Mori J.
    Ochi M.
    Sakata I.
    [J]. Journal of Computational Social Science, 2018, 1 (2): : 313 - 326
  • [4] LEARNING SEMANTIC EMBEDDING AT A LARGE SCALE
    Tsai, Min-Hsuan
    Wang, Jinjun
    Zhang, Tong
    Gong, Yihong
    Huang, Thomas S.
    [J]. 2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011,
  • [5] Tracking Semantic Evolutionary Changes in Large-Scale Ontological Knowledge Bases
    Liu, Zhao
    Lu, Chang
    Alghamdi, Ghadah
    Schmidt, Renate A.
    Zhao, Yizheng
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 1130 - 1139
  • [6] Learning Binary Semantic Embedding for Large-Scale Breast Histology Image Analysis
    Liu, Xingbo
    Kang, Xiao
    Nie, Xiushan
    Guo, Jie
    Wang, Shaohua
    Yin, Yilong
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (07) : 3240 - 3250
  • [7] Incremental Learning for Semantic Segmentation of Large-Scale Remote Sensing Data
    Tasar, Onur
    Tarabalka, Yuliya
    Alliez, Pierre
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2019, 12 (09) : 3524 - 3537
  • [8] A large-scale remote sensing scene dataset construction for semantic segmentation
    Xu, LeiLei
    Shi, ShanQiu
    Liu, YuJun
    Zhang, Hao
    Wang, Dan
    Zhang, Lu
    Liang, Wan
    Chen, Hao
    [J]. INTERNATIONAL JOURNAL OF IMAGE AND DATA FUSION, 2023, 14 (04) : 299 - 323
  • [9] A Divide-and-Conquer Evolutionary Algorithm for Large-Scale Virtual Network Embedding
    Song, An
    Chen, Wei-Neng
    Gong, Yue-Jiao
    Luo, Xiaonan
    Zhang, Jun
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2020, 24 (03) : 566 - 580
  • [10] Large-Scale Heterogeneous Feature Embedding
    Huang, Xiao
    Song, Qingquan
    Yang, Fan
    Hu, Xia
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 3878 - 3885