The global trace graph, a novel paradigm for searching protein sequence databases

被引:17
|
作者
Heger, Andreas
Mallick, Swapan
Wilton, Christopher
Holm, Liisa
机构
[1] Univ Helsinki, Inst Biotechnol, FI-00014 Helsinki, Finland
[2] Univ Helsinki, Dept Biol & Environm Sci, Div Genet, FI-00014 Helsinki, Finland
[3] Univ Oxford, Dept Physiol Anat & Genet, MRC, Funct Genet Unit, Oxford OX1 3QX, England
[4] Harvard Univ, Sch Med, Dept Genet, Boston, MA USA
[5] Babraham Inst, Cambridge, England
关键词
D O I
10.1093/bioinformatics/btm358
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Propagating functional annotations to sequence- similar, presumably homologous proteins lies at the heart of the bioinformatics industry. Correct propagation is crucially dependent on the accurate identification of subtle sequence motifs that are conserved in evolution. The evolutionary signal can be difficult to detect because functional sites may consist of non-contiguous residues while segments in-between may be mutated without affecting fold or function. Results: Here, we report a novel graph clustering algorithm in which all known protein sequences simultaneously self-organize into hypothetical multiple sequence alignments. This eliminates noise so that non- contiguous sequence motifs can be tracked down between extremely distant homologues. The novel data structure enables fast sequence database searching methods which are superior to profile-profile comparison at recognizing distant homologues. This study will boost the leverage of structural and functional genomics and opens up new avenues for data mining a complete set of functional signature motifs.
引用
收藏
页码:2361 / 2367
页数:7
相关论文
共 50 条
  • [41] Searching protein structure databases with DaliLite v.3
    Holm, L.
    Kaariainen, S.
    Rosenstrom, P.
    Schenkel, A.
    BIOINFORMATICS, 2008, 24 (23) : 2780 - 2781
  • [42] Motif-based searching in TOPS protein topology databases
    Gilbert, D
    Westhead, D
    Nagano, N
    Thornton, J
    BIOINFORMATICS, 1999, 15 (04) : 317 - 326
  • [43] The EBI's nucleotide and protein sequence databases
    O'Donovan, C
    CYTOGENETICS AND CELL GENETICS, 1999, 85 (1-2): : 12 - 12
  • [44] Fast motif search in protein sequence databases
    Zheleva, Elena
    Arslan, Abdullah N.
    COMPUTER SCIENCE - THEORY AND APPLICATIONS, 2006, 3967 : 670 - 681
  • [45] A novel filtration method in biological sequence databases
    Lee, Anthony J. T.
    Lin, Chao-Wen
    Lo, Wen-Hsing
    Chen, Chieh-Chun
    Chen, Jia-Xin
    PATTERN RECOGNITION LETTERS, 2007, 28 (04) : 447 - 458
  • [46] ONLINE PROTEIN-SEQUENCE DATA SEARCHING
    SCHWARZWALDER, R
    DATABASE, 1991, 14 (05): : 106 - 108
  • [47] A novel architecture for genomic sequence searching and alignment
    Gardner-Stephen, P
    Knowles, G
    ADVANCES IN COMPUTER SYSTEMS ARCHITECTURE, 2003, 2823 : 180 - 192
  • [48] GString: A novel approach for efficient search in graph databases
    Jiang, Haoliang
    Wang, Haixun
    Yu, Philip S.
    Zhou, Shuigeng
    2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2007, : 541 - +
  • [49] A searching and reporting system for relational databases using a graph-based metadata representation
    Hewitt, R
    Gobbi, A
    Lee, ML
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2005, 45 (04) : 863 - 869
  • [50] A novel keyword search paradigm in relational databases: Object summaries
    Fakas, Georgios John
    DATA & KNOWLEDGE ENGINEERING, 2011, 70 (02) : 208 - 229