Real-time structural motif searching in proteins using an inverted index strategy

被引:16
|
作者
Bittrich, Sebastian [1 ]
Burley, Stephen K. [1 ,2 ,3 ,4 ,5 ]
Rose, Alexander S. [1 ]
机构
[1] Univ Calif San Diego, San Diego Supercomp Ctr, RCSB Prot Data Bank, La Jolla, CA 92093 USA
[2] Rutgers State Univ, RCSB Prot Data Bank, Inst Quantitat Biomed, Piscataway, NJ USA
[3] Rutgers State Univ, Dept Chem & Chem Biol, Piscataway, NJ USA
[4] Rutgers State Univ, Canc Inst New Jersey, New Brunswick, NJ USA
[5] Univ Calif San Diego, Skaggs Sch Pharm & Pharmaceut Sci, La Jolla, CA USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
ALGORITHM; SUPERFAMILY; MECHANISM;
D O I
10.1371/journal.pcbi.1008502
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Author summary The Protein Data Bank (PDB) provides open access to more than 170,000 three-dimensional structures of proteins, nucleic acids, and biological complexes. Similarities between PDB structures give valuable functional and evolutionary insights but such resemblance may not be evident at sequence or global structure level. Throughout the database, there are recurring structural motifs-groups of modest numbers of residues in proximity that, for example, support catalytic activity. Identification of common structural motifs can reveal similarities between proteins and serve as fingerprints for spatial configurations of amino acids, such as the His-Asp-Ser catalytic triad found in serine proteases or the zinc coordination site found in Zinc Finger DNA-binding domains. We present a highly efficient yet flexible strategy that allows users for the first time to search for arbitrary structural motifs across the entire PDB archive in real-time. Our approach scales favorably with the increasing number and complexity of deposited structures, and, also, has the potential to be adapted for other applications in a macromolecular context. Biochemical and biological functions of proteins are the product of both the overall fold of the polypeptide chain, and, typically, structural motifs made up of smaller numbers of amino acids constituting a catalytic center or a binding site that may be remote from one another in amino acid sequence. Detection of such structural motifs can provide valuable insights into the function(s) of previously uncharacterized proteins. Technically, this remains an extremely challenging problem because of the size of the Protein Data Bank (PDB) archive. Existing methods depend on a clustering by sequence similarity and can be computationally slow. We have developed a new approach that uses an inverted index strategy capable of analyzing >170,000 PDB structures with unmatched speed. The efficiency of the inverted index method depends critically on identifying the small number of structures containing the query motif and ignoring most of the structures that are irrelevant. Our approach (implemented at ) enables real-time retrieval and superposition of structural motifs, either extracted from a reference structure or uploaded by the user. Herein, we describe the method and present five case studies that exemplify its efficacy and speed for analyzing 3D structures of both proteins and nucleic acids.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Accelerating real-time string searching with multicore processors
    Villa, Oreste
    Scarpazza, Daniele Paolo
    Petrini, Fabrizio
    COMPUTER, 2008, 41 (04) : 42 - +
  • [22] A new kind of searching method for real-time inference
    Tu, CY
    Zhang, Y
    Tu, CY
    He, X
    1997 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT PROCESSING SYSTEMS, VOLS 1 & 2, 1997, : 1102 - 1104
  • [23] Searching for optical transients in real-time: The RAPTOR experiment
    Vestrand, WT
    Borozdin, K
    Brumby, SP
    Casperson, D
    Fenimore, E
    Galassi, M
    Gisler, G
    McGowan, K
    Perkins, S
    Priedhorsky, W
    Starr, D
    White, R
    Wozniak, P
    Wren, J
    GAMMA-RAY BURST AND AFTERGLOW ASTRONOMY 2001, 2003, 662 : 547 - 549
  • [24] Learning in Real-time Strategy Games
    Padmanabhan, Vineet
    Goud, Pranay
    Pujari, Arun K.
    Sethy, Harshit
    2015 14TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (ICIT 2015), 2015, : 165 - 170
  • [25] Real-Time Strategy Game Competitions
    Buro, Michael
    Churchill, David
    AI MAGAZINE, 2012, 33 (03) : 106 - 108
  • [26] Real-time Locomotion Controller using an Inverted-Pendulum-based Abstract Model
    Hwang, Jaepyung
    Kim, Jongmin
    Suh, Il Hong
    Kwon, Taesoo
    COMPUTER GRAPHICS FORUM, 2018, 37 (02) : 287 - 296
  • [27] ASYNCHRONOUS INDEX STRATEGY FOR HIGH PERFORMANCE REAL-TIME BIG DATA STREAM STORAGE
    Mo, Xiao
    Wang, Hao
    PROCEEDINGS OF THE 3RD IEEE INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC 2012), 2012, : 232 - 236
  • [28] Dynamic Compensation Control Strategy of PFC Based on Real-time Calculation of Performance Index
    Li, Jun
    Gao, Song
    Wang, Tao
    Sun, MengMeng
    Lei, WenTao
    5TH INTERNATIONAL CONFERENCE ON ADVANCES IN ENERGY, ENVIRONMENT AND CHEMICAL ENGINEERING, 2019, 358
  • [29] Development and Validation of a Real-Time Happiness Index Using Google Trends™
    Talita Greyling
    Stephanié Rossouw
    Journal of Happiness Studies, 2025, 26 (3)
  • [30] A Safety Index for Smart Mobility using Real-Time Crowdsourced Data
    Smith, Neale A.
    Camacho, Adriana C.
    Escobedo, Edgar J.
    Contreras, Jonatan M.
    Mondragon, Oscar
    Villanueva-Rosales, Natalia
    Cheu, Ruey Long
    Larios, Victor M.
    2020 IEEE INTERNATIONAL SMART CITIES CONFERENCE (ISC2), 2020,