Filling-in Void and Sparse Regions in Protein Sequence Space by Protein-Like Artificial Sequences Enables Remarkable Enhancement in Remote Homology Detection Capability

被引:12
|
作者
Mudgal, Richa [1 ]
Sowdhamini, Ramanathan [2 ]
Chandra, Nagasuma [3 ]
Srinivasan, Narayanaswamy [4 ]
Sandhya, Sankaran [3 ]
机构
[1] Indian Inst Sci, IISc Math Initiat, Bangalore 560012, Karnataka, India
[2] Univ Agr Sci Bangalore, Natl Ctr Biol Sci, Bangalore 560065, Karnataka, India
[3] Indian Inst Sci, Dept Biochem, Bangalore 560012, Karnataka, India
[4] Indian Inst Sci, Mol Biophys Unit, Bangalore 560012, Karnataka, India
关键词
remote homology detection; in silico protein design; protein evolution; HIDDEN MARKOV-MODELS; FAMILIES; DATABASE; FOLD; INFORMATION; EVOLUTION; ALIGNMENT; PROFILES; SEARCHES; DOMAINS;
D O I
10.1016/j.jmb.2013.11.026
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Protein functional annotation relies on the identification of accurate relationships, sequence divergence being a key factor. This is especially evident when distant protein relationships are demonstrated only with three-dimensional structures. To address this challenge, we describe a computational approach to purposefully bridge gaps between related protein families through directed design of protein-like "linker" sequences. For this, we represented SCOP domain families, integrated with sequence homologues, as multiple profiles and performed HMM-HMM alignments between related domain families. Where convincing alignments were achieved, we applied a roulette wheel-based method to design 3,611,010 protein-like sequences corresponding to 374 SCOP folds. To analyze their ability to link proteins in homology searches, we used 3024 queries to search two databases, one containing only natural sequences and another one additionally containing designed sequences. Our results showed that augmented database searches showed up to 30% improvement in fold coverage for over 74% of the folds, with 52 folds achieving all theoretically possible connections. Although sequences could not be designed between some families, the availability of designed sequences between other families within the fold established the sequence continuum to demonstrate 373 difficult relationships. Ultimately, as a practical and realistic extension, we demonstrate that such protein-like sequences can be "plugged-into" routine and generic sequence database searches to empower not only remote homology detection but also fold recognition. Our richly statistically supported findings show that complementary searches in both databases will increase the effectiveness of sequence-based searches in recognizing all homologues sharing a common fold. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:962 / 979
页数:18
相关论文
共 2 条
  • [1] NrichD database: sequence databases enriched with computationally designed protein-like sequences aid in remote homology detection
    Mudgal, Richa
    Sandhya, Sankaran
    Kumar, Gayatri
    Sowdhamini, Ramanathan
    Chandra, Nagasuma R.
    Srinivasan, Narayanaswamy
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) : D300 - D305
  • [2] Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins
    Sandhya, S.
    Mudgal, R.
    Jayadev, C.
    Abhinandan, K. R.
    Sowdhamini, R.
    Srinivasan, N.
    [J]. MOLECULAR BIOSYSTEMS, 2012, 8 (08) : 2076 - 2084