Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding

被引:19
|
作者
Melvin, Iain [1 ]
Weston, Jason [2 ]
Noble, William Stafford [3 ]
Leslie, Christina [4 ]
机构
[1] NEC Labs Amer, Princeton, NJ USA
[2] Google, New York, NY USA
[3] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[4] Mem Sloan Kettering Canc Ctr, Computat Biol Program, New York, NY 10021 USA
来源
PLOS COMPUTATIONAL BIOLOGY | 2011年 / 7卷 / 01期
关键词
HOMOLOGY DETECTION; DATABASE; SERVER;
D O I
10.1371/journal.pcbi.1001047
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods-i.e., measures of similarity between query and target sequences-provide the engine for sequence database search and have been the subject of 30 years of computational research. For the difficult problem of detecting remote evolutionary relationships between protein sequences, the most successful pairwise comparison methods involve building local models (e. g., profile hidden Markov models) of protein sequences. However, recent work in massive data domains like web search and natural language processing demonstrate the advantage of exploiting the global structure of the data space. Motivated by this work, we present a large-scale algorithm called PROTEMBED, which learns an embedding of protein sequences into a low-dimensional "semantic space." Evolutionarily related proteins are embedded in close proximity, and additional pieces of evidence, such as 3D structural similarity or class labels, can be incorporated into the learning process. We find that PROTEMBED achieves superior accuracy to widely used pairwise sequence methods like PSI-BLAST and HHSearch for remote homology detection; it also outperforms our previous RANKPROP algorithm, which incorporates global structure in the form of a protein similarity network. Finally, the PROTEMBED embedding space can be visualized, both at the global level and local to a given query, yielding intuition about the structure of protein sequence space.
引用
收藏
页数:8
相关论文
共 50 条
  • [11] Large-Scale Unsupervised Semantic Segmentation
    Gao, Shanghua
    Li, Zhong-Yu
    Yang, Ming-Hsuan
    Cheng, Ming-Ming
    Han, Junwei
    Torr, Philip
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7457 - 7476
  • [12] A large-scale semantic grid repository
    Babik, Marian
    Hluchy, Ladislav
    [J]. PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2008, 4967 : 738 - 745
  • [13] Evolutionary approach for semantic-based query sampling in large-scale information sources
    Jung, Jason J.
    [J]. INFORMATION SCIENCES, 2012, 182 (01) : 30 - 39
  • [14] Large-Scale Reasoning with (Semantic) Data
    Antoniou, Grigoris
    Batsakis, Sotiris
    Tachmazidis, Ilias
    [J]. 4TH INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, MINING AND SEMANTICS, 2014,
  • [15] Large-scale latent semantic analysis
    Olney, Andrew McGregor
    [J]. BEHAVIOR RESEARCH METHODS, 2011, 43 (02) : 414 - 423
  • [16] Large-scale latent semantic analysis
    Andrew McGregor Olney
    [J]. Behavior Research Methods, 2011, 43 : 414 - 423
  • [17] Detecting Discontinuities in Large-Scale Systems
    Malik, Haroon
    Davis, Ian J.
    Godfrey, Michael W.
    Neuse, Douglas
    Mankovskii, Serge
    [J]. 2014 IEEE/ACM 7TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2014, : 345 - 354
  • [18] Large-scale prediction of adverse drug reactions-related proteins with network embedding
    Park, Jaesub
    Lee, Sangyeon
    Kim, Kwansoo
    Jung, Jaegyun
    Lee, Doheon
    [J]. BIOINFORMATICS, 2023, 39 (01)
  • [19] Efficient Patch-Wise Semantic Segmentation for Large-Scale Remote Sensing Images
    Liu, Yan
    Ren, Qirui
    Geng, Jiahui
    Ding, Meng
    Li, Jiangyun
    [J]. SENSORS, 2018, 18 (10)
  • [20] Large-scale purification of proteins
    Johansson, Hans J.
    Berg, Hans
    Gilbert, Patrick
    Hicks, Mark
    Tinsley, Caroline
    [J]. Genetic Engineering and Biotechnology News, 2015, 35 (21): : 30 - 31