The global trace graph, a novel paradigm for searching protein sequence databases

被引:17
|
作者
Heger, Andreas
Mallick, Swapan
Wilton, Christopher
Holm, Liisa
机构
[1] Univ Helsinki, Inst Biotechnol, FI-00014 Helsinki, Finland
[2] Univ Helsinki, Dept Biol & Environm Sci, Div Genet, FI-00014 Helsinki, Finland
[3] Univ Oxford, Dept Physiol Anat & Genet, MRC, Funct Genet Unit, Oxford OX1 3QX, England
[4] Harvard Univ, Sch Med, Dept Genet, Boston, MA USA
[5] Babraham Inst, Cambridge, England
关键词
D O I
10.1093/bioinformatics/btm358
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Propagating functional annotations to sequence- similar, presumably homologous proteins lies at the heart of the bioinformatics industry. Correct propagation is crucially dependent on the accurate identification of subtle sequence motifs that are conserved in evolution. The evolutionary signal can be difficult to detect because functional sites may consist of non-contiguous residues while segments in-between may be mutated without affecting fold or function. Results: Here, we report a novel graph clustering algorithm in which all known protein sequences simultaneously self-organize into hypothetical multiple sequence alignments. This eliminates noise so that non- contiguous sequence motifs can be tracked down between extremely distant homologues. The novel data structure enables fast sequence database searching methods which are superior to profile-profile comparison at recognizing distant homologues. This study will boost the leverage of structural and functional genomics and opens up new avenues for data mining a complete set of functional signature motifs.
引用
收藏
页码:2361 / 2367
页数:7
相关论文
共 50 条
  • [21] STRATEGY FOR SEARCHING RELATED PROTEIN SEQUENCES IN DATABASES
    KOPKE, AKE
    WITTMANNLIEBOLD, B
    JOURNAL OF PROTEIN CHEMISTRY, 1988, 7 (03): : 254 - 255
  • [22] A novel graph containment query algorithm on graph databases
    Li, Xiantong
    Zhang, Wei
    Li, Jianzhong
    Journal of Digital Information Management, 2009, 7 (03): : 143 - 151
  • [23] Integrated approach for manual evaluation of peptides identified by searching protein sequence databases with tandem mass spectra
    Chen, Y
    Kwon, SW
    Kim, SC
    Zhao, YM
    JOURNAL OF PROTEOME RESEARCH, 2005, 4 (03) : 998 - 1005
  • [24] Sequence-based searching of custom proteome and transcriptome databases
    Medvar, Barbara
    Sarkar, Abhijit
    Knepper, Mark
    Pisitkun, Trairak
    PHYSIOLOGICAL REPORTS, 2018, 6 (18):
  • [25] Analysis of string-searching algorithms on biological sequence databases
    Sheik, SS
    Aggarwal, SK
    Poddar, A
    Sathiyabhama, B
    Balakrishnan, N
    Sekar, K
    CURRENT SCIENCE, 2005, 89 (02): : 368 - 374
  • [26] Sequence databases and homology searching using World Wide Web
    Paterson, M
    MOLECULAR MEDICINE TODAY, 1996, 2 (03): : 98 - 102
  • [27] Searching for similar reactions and molecules using the power of graph databases and the graph edit distance metric
    Delannee, Victorien
    Nicklaus, Marc
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2019, 258
  • [28] COMPRESSION OF PROTEIN-SEQUENCE DATABASES
    STRELETS, VB
    LIM, HA
    COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1995, 11 (05): : 557 - 561
  • [29] Protein sequence and structure databases:: A review
    Araúzo-Bravo, MJ
    Ahmad, S
    CURRENT ANALYTICAL CHEMISTRY, 2005, 1 (03) : 355 - 371
  • [30] Searching sequences in protein databases generated by overlapping translation
    Benyó, B
    Biro, J
    Fördös, G
    Benyó, Z
    FEBS JOURNAL, 2005, 272 : 106 - 106