Real-time structural motif searching in proteins using an inverted index strategy

被引:16
|
作者
Bittrich, Sebastian [1 ]
Burley, Stephen K. [1 ,2 ,3 ,4 ,5 ]
Rose, Alexander S. [1 ]
机构
[1] Univ Calif San Diego, San Diego Supercomp Ctr, RCSB Prot Data Bank, La Jolla, CA 92093 USA
[2] Rutgers State Univ, RCSB Prot Data Bank, Inst Quantitat Biomed, Piscataway, NJ USA
[3] Rutgers State Univ, Dept Chem & Chem Biol, Piscataway, NJ USA
[4] Rutgers State Univ, Canc Inst New Jersey, New Brunswick, NJ USA
[5] Univ Calif San Diego, Skaggs Sch Pharm & Pharmaceut Sci, La Jolla, CA USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
ALGORITHM; SUPERFAMILY; MECHANISM;
D O I
10.1371/journal.pcbi.1008502
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Author summary The Protein Data Bank (PDB) provides open access to more than 170,000 three-dimensional structures of proteins, nucleic acids, and biological complexes. Similarities between PDB structures give valuable functional and evolutionary insights but such resemblance may not be evident at sequence or global structure level. Throughout the database, there are recurring structural motifs-groups of modest numbers of residues in proximity that, for example, support catalytic activity. Identification of common structural motifs can reveal similarities between proteins and serve as fingerprints for spatial configurations of amino acids, such as the His-Asp-Ser catalytic triad found in serine proteases or the zinc coordination site found in Zinc Finger DNA-binding domains. We present a highly efficient yet flexible strategy that allows users for the first time to search for arbitrary structural motifs across the entire PDB archive in real-time. Our approach scales favorably with the increasing number and complexity of deposited structures, and, also, has the potential to be adapted for other applications in a macromolecular context. Biochemical and biological functions of proteins are the product of both the overall fold of the polypeptide chain, and, typically, structural motifs made up of smaller numbers of amino acids constituting a catalytic center or a binding site that may be remote from one another in amino acid sequence. Detection of such structural motifs can provide valuable insights into the function(s) of previously uncharacterized proteins. Technically, this remains an extremely challenging problem because of the size of the Protein Data Bank (PDB) archive. Existing methods depend on a clustering by sequence similarity and can be computationally slow. We have developed a new approach that uses an inverted index strategy capable of analyzing >170,000 PDB structures with unmatched speed. The efficiency of the inverted index method depends critically on identifying the small number of structures containing the query motif and ignoring most of the structures that are irrelevant. Our approach (implemented at ) enables real-time retrieval and superposition of structural motifs, either extracted from a reference structure or uploaded by the user. Herein, we describe the method and present five case studies that exemplify its efficacy and speed for analyzing 3D structures of both proteins and nucleic acids.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Improved Structural Characterization of Glycerophospholipids and Sphingomyelins with Real-Time Library Searching
    Brademan, Dain R.
    Overmyer, Katherine A.
    He, Yuchen
    Barshop, William D.
    Canterbury, Jesse D.
    Bills, Brandon J.
    Anderson, Benton J.
    Hutchins, Paul D.
    Sharma, Seema
    Zabrouskov, Vlad
    McAlister, Graeme C.
    Coon, Joshua J.
    [J]. ANALYTICAL CHEMISTRY, 2023, 95 (20) : 7813 - 7821
  • [2] A Novel Inverted Index File based Searching Strategy for Video Copy Detection
    Liu, Mengyang
    Po, Lai-man
    Rehman, Yasar Abbas Ur
    Xu, Xuyuan
    Li, Yuming
    Feng, Litong
    [J]. 2017 SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA 2017), 2017, : 307 - 312
  • [3] Shortening the Candidate List for Similarity Searching Using Inverted Index
    Figueroa, Karina
    Camarena-Ibarrola, Antonio
    Reyes, Nora
    [J]. PATTERN RECOGNITION (MCPR 2021), 2021, 12725 : 89 - 97
  • [4] Relational reasoning for real-time object searching
    Ren, Tao
    Dong, Zhuoran
    Qi, Fang
    Dong, Puqing
    Chen, Shuang
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2021, 30 (06)
  • [5] Improving the Real-Time Searching in the Organizational Memory
    Sanchez Reynoso, Maria Laura
    Divan, Mario
    [J]. PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY [ICICT-2019], 2019, 154 : 293 - 304
  • [6] A Real-Time All-Atom Structural Search Engine for Proteins
    Gonzalez, Gabriel
    Hannigan, Brett
    DeGrado, William F.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2014, 10 (07)
  • [7] Real-Time Structural Inspection Using Augmented Reality
    Peplinski, Jack
    Singh, Premjeet
    Sadhu, Ayan
    [J]. PROCEEDINGS OF THE CANADIAN SOCIETY OF CIVIL ENGINEERING ANNUAL CONFERENCE 2022, VOL 2, CSCE 2022, 2023, 348 : 1045 - 1057
  • [8] Real-time prediction of magnetospheric activity using the Boyle Index
    Bala, Ramkumar
    Reiff, P. H.
    Landivar, J. E.
    [J]. SPACE WEATHER-THE INTERNATIONAL JOURNAL OF RESEARCH AND APPLICATIONS, 2009, 7
  • [9] REAL-TIME STRUCTURAL MODELING
    JOHNSON, JH
    WAGENHOFER, PJ
    [J]. GEOPHYSICS, 1987, 52 (03) : 397 - 397
  • [10] REAL-TIME STRUCTURAL MODELING
    JOHNSON, JH
    WAGENHOFER, PJ
    [J]. AAPG BULLETIN-AMERICAN ASSOCIATION OF PETROLEUM GEOLOGISTS, 1987, 71 (05): : 573 - 573