The global trace graph, a novel paradigm for searching protein sequence databases

被引:17
|
作者
Heger, Andreas
Mallick, Swapan
Wilton, Christopher
Holm, Liisa
机构
[1] Univ Helsinki, Inst Biotechnol, FI-00014 Helsinki, Finland
[2] Univ Helsinki, Dept Biol & Environm Sci, Div Genet, FI-00014 Helsinki, Finland
[3] Univ Oxford, Dept Physiol Anat & Genet, MRC, Funct Genet Unit, Oxford OX1 3QX, England
[4] Harvard Univ, Sch Med, Dept Genet, Boston, MA USA
[5] Babraham Inst, Cambridge, England
关键词
D O I
10.1093/bioinformatics/btm358
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Propagating functional annotations to sequence- similar, presumably homologous proteins lies at the heart of the bioinformatics industry. Correct propagation is crucially dependent on the accurate identification of subtle sequence motifs that are conserved in evolution. The evolutionary signal can be difficult to detect because functional sites may consist of non-contiguous residues while segments in-between may be mutated without affecting fold or function. Results: Here, we report a novel graph clustering algorithm in which all known protein sequences simultaneously self-organize into hypothetical multiple sequence alignments. This eliminates noise so that non- contiguous sequence motifs can be tracked down between extremely distant homologues. The novel data structure enables fast sequence database searching methods which are superior to profile-profile comparison at recognizing distant homologues. This study will boost the leverage of structural and functional genomics and opens up new avenues for data mining a complete set of functional signature motifs.
引用
收藏
页码:2361 / 2367
页数:7
相关论文
共 50 条
  • [1] A SEQUENCE PROPERTY APPROACH TO SEARCHING PROTEIN DATABASES
    HOBOHM, U
    SANDER, C
    JOURNAL OF MOLECULAR BIOLOGY, 1995, 251 (03) : 390 - 399
  • [2] SEARCHING GENE AND PROTEIN-SEQUENCE DATABASES
    BARSALOU, T
    BRUTLAG, DL
    M D COMPUTING, 1991, 8 (03): : 144 - 149
  • [3] ProGreSS: Simultaneous searching of protein databases by sequence and structure
    Bhattacharya, A
    Can, T
    Kahveci, T
    Singh, AK
    Wang, YF
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2004, 2003, : 264 - 275
  • [4] BRQS Matching Algorithm for Searching Protein Sequence Databases
    Klaib, Ahmad Fadel
    Osborne, Hugh
    INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATIONS, PROCEEDINGS, 2009, : 223 - +
  • [5] COMPARISON OF METHODS FOR SEARCHING PROTEIN-SEQUENCE DATABASES
    PEARSON, WR
    PROTEIN SCIENCE, 1995, 4 (06) : 1145 - 1160
  • [6] Searching Protein Sequence Databases Using BRBMH Matching Algorithm
    Klaib, Ahmad Fadal
    Osborne, Hugh
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2008, 8 (12): : 410 - 414
  • [7] SEARCHING THROUGH SEQUENCE DATABASES
    DOOLITTLE, RF
    METHODS IN ENZYMOLOGY, 1990, 183 : 99 - 110
  • [8] Strategies for searching sequence databases
    Nicholas, HB
    Deerfield, DW
    Ropelewski, AJ
    BIOTECHNIQUES, 2000, 28 (06) : 1174 - +
  • [9] Bioinformatics - Rapid searching of sequence databases
    Bottomley, S
    DRUG DISCOVERY TODAY, 1999, 4 (10) : 482 - 484
  • [10] ISSUES IN SEARCHING MOLECULAR SEQUENCE DATABASES
    ALTSCHUL, SF
    BOGUSKI, MS
    GISH, W
    WOOTTON, JC
    NATURE GENETICS, 1994, 6 (02) : 119 - 129