RCSB Protein Data Bank: Efficient Searching and Simultaneous Access to One Million Computed Structure Models Alongside the PDB Structures Enabled by Architectural Advances

被引:25
|
作者
Bittrich, Sebastian [1 ]
Bhikadiya, Charmi [1 ]
Bi, Chunxiao [1 ]
Chao, Henry [2 ,3 ]
Duarte, Jose M. [1 ]
Dutta, Shuchismita [2 ,3 ,4 ]
Fayazi, Maryam [2 ,3 ]
Henry, Jeremy [1 ]
Khokhriakov, Igor [1 ]
Lowe, Robert [2 ,3 ]
Piehl, Dennis W. [2 ,3 ]
Segura, Joan [1 ]
Vallat, Brinda [2 ,3 ,4 ]
Voigt, Maria [2 ,3 ]
Westbrook, John D. [2 ,3 ,4 ]
Burley, Stephen K. [1 ,2 ,3 ,4 ,5 ]
Rose, Yana [1 ]
机构
[1] Univ Calif La Jolla, San Diego Supercomp Ctr, Res Collaboratory Struct Bioinformat Prot Data Ba, La Jolla, CA 92093 USA
[2] Rutgers State Univ, Res Collaboratory Struct Bioinformat Prot Data Ban, Piscataway, NJ 08854 USA
[3] Rutgers State Univ, Inst Quantitat Biomed, Piscataway, NJ 08854 USA
[4] Rutgers State Univ, Canc Inst New Jersey, New Brunswick, NJ 08901 USA
[5] Rutgers State Univ, Dept Chem & Chem Biol, Piscataway, NJ 08854 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
FAIR principles; computer architecture; databases; structural biology; protein structure prediction; BIOLOGICAL MACROMOLECULES; SEQUENCE; ACCURACY; TOOLS;
D O I
10.1016/j.jmb.2023.167994
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) provides open access to experimentally-determined three-dimensional (3D) structures of biomolecules. The RCSB PDB RCSB.org research-focused web portal is used annually by many millions of users around the world. They access biostructure information, run complex queries utilizing various search services (e.g., full-text, structural and chemical attribute, chemical, sequence, and structure similarity searches), and visualize macromolecules in 3D, all at no charge and with no limitations on data usage. Notwithstanding more than 24,000-fold growth of the PDB over the past five decades, experimentally-determined structures are only available for a small subset of the millions of proteins of known sequence. Recently developed machine learning software tools can predict 3D structures of proteins at accuracies comparable to lower-resolution experimental methods. The RCSB PDB now provides access to & DBLBOND;1,000,000 Computed Structure Models (CSMs) of proteins coming from AlphaFold DB and the ModelArchive alongside & DBLBOND;200,000 experimentally-determined PDB structures. Both CSMs and PDB structures are available on RCSB.org and via well-established RCSB PDB Data, Search, and 1D-Coordinates application programming interfaces (APIs). Simultaneous delivery of PDB data and CSMs provides users with access to complementary structural information across the human proteome and those of model organisms and selected pathogens. API enhancements are backwards-compatible and programmatic users can "opt in" to access CSMs with min-imal effort. Herein, we describe modifications to RCSB PDB cyberinfrastructure required to support sixfold scaling of 3D biostructure data delivery and lay the groundwork for scaling to accommodate hundreds of millions of CSMs.& COPY; 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecom-mons.org/licenses/by/4.0/).
引用
收藏
页数:10
相关论文
共 9 条
  • [1] RCSB Protein Data Bank: Architectural Advances Towards Integrated Searching and Efficient Access to Macromolecular Structure Data from the PDB Archive
    Rose, Yana
    Duarte, Jose M.
    Lowe, Robert
    Segura, Joan
    Bi, Chunxiao
    Bhikadiya, Charmi
    Chen, Li
    Rose, Alexander S.
    Bittrich, Sebastian
    Burley, Stephen K.
    Westbrook, John D.
    JOURNAL OF MOLECULAR BIOLOGY, 2021, 433 (11)
  • [2] RCSB Protein Data Bank: visualizing groups of experimentally determined PDB structures alongside computed structure models of proteins
    Segura, Joan
    Rose, Yana
    Bi, Chunxiao
    Duarte, Jose
    Burley, Stephen K.
    Bittrich, Sebastian
    FRONTIERS IN BIOINFORMATICS, 2023, 3
  • [3] RCSB Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning
    Burley, Stephen K.
    Bhikadiya, Charmi
    Bi, Chunxiao
    Bittrich, Sebastian
    Chao, Henry
    Chen, Li
    Craig, Paul A.
    Crichlow, Gregg, V
    Dalenberg, Kenneth
    Duarte, Jose M.
    Dutta, Shuchismita
    Fayazi, Maryam
    Feng, Zukang
    Flatt, Justin W.
    Ganesan, Sai
    Ghosh, Sutapa
    Goodsell, David S.
    Green, Rachel Kramer
    Guranovic, Vladimir
    Henry, Jeremy
    Hudson, Brian P.
    Khokhriakov, Igor
    Lawson, Catherine L.
    Liang, Yuhe
    Lowe, Robert
    Peisach, Ezra
    Persikova, Irina
    Piehl, Dennis W.
    Rose, Yana
    Sali, Andrej
    Segura, Joan
    Sekharan, Monica
    Shao, Chenghua
    Vallat, Brinda
    Voigt, Maria
    Webb, Ben
    Westbrook, John D.
    Whetstone, Shamara
    Young, Jasmine Y.
    Zalevsky, Arthur
    Zardecki, Christine
    NUCLEIC ACIDS RESEARCH, 2023, 51 (D1) : D488 - D508
  • [4] RCSB Protein Data Bank: Integrated Searching and Efficient Access to Macromolecular Structure Data from the PDB Archive
    Hudson, Brian
    Rose, Yana
    Duarte, Jose M.
    Lowe, Robert
    Bi, Chunxiao
    Bhikadiya, Charmi
    Chen, Li
    Bittrich, Sebastian
    Segura, Joan
    Burley, Stephen
    Westbrook, John
    Rose, Alexander S.
    ACTA CRYSTALLOGRAPHICA A-FOUNDATION AND ADVANCES, 2021, 77 : A253 - A253
  • [5] Parallel delivery of experimentally determined structures and computed structure models at RCSB protein data bank (RCSB PDB, RCSB.ORG)
    Piehl, Dennis W.
    Burley, Stephen K.
    BIOPHYSICAL JOURNAL, 2024, 123 (03) : 280A - 280A
  • [6] Updated resources for exploring experimentally-determined PDB structures and Computed Structure Models at the RCSB Protein Data Bank
    Burley, StephenK
    Bhatt, Rusham
    Bhikadiya, Charmi
    Bi, Chunxiao
    Biester, Alison
    Biswas, Pratyoy
    Bittrich, Sebastian
    Blaumann, Santiago
    Brown, Ronald
    Chao, Henry
    Chithari, Vivek Reddy
    Craig, Paul A.
    Crichlow, Gregg V.
    Duarte, Jose M.
    Dutta, Shuchismita
    Feng, Zukang
    Flatt, Justin W.
    Ghosh, Sutapa
    Goodsell, David S.
    Green, Rachel Kramer
    Guranovic, Vladimir
    Henry, Jeremy
    Hudson, Brian P.
    Joy, Michael
    Kaelber, Jason T.
    Khokhriakov, Igor
    Lai, Jhih-Siang
    Lawson, Catherine L.
    Liang, Yuhe
    Myers-Turnbull, Douglas
    Peisach, Ezra
    Persikova, Irina
    Piehl, Dennis W.
    Pingale, Aditya
    Rose, Yana
    Sagendorf, Jared
    Sali, Andrej
    Segura, Joan
    Sekharan, Monica
    Shao, Chenghua
    Smith, James
    Trumbull, Michael
    Vallat, Brinda
    Voigt, Maria
    Webb, Ben
    Whetstone, Shamara
    Wu-Wu, Amy
    Xing, Tongji
    Young, Jasmine Y.
    Zalevsky, Arthur
    NUCLEIC ACIDS RESEARCH, 2024, 53 (D1)
  • [7] Exploring experimental structures and computed structure models from artificial intelligence/machine learning at RCSB Protein Data Bank (RCSB PDB, RCSB.org)
    Segura, Joan
    Duarte, Jose
    Bittrich, Sebastian
    Bi, Chunxiao
    Bhikadiya, Charmi
    Fayazi, Maryam
    Henry, Jeremy
    Khokhriakov, Igor
    Lowe, Robert
    Piehl, Dennis W.
    Vallat, Brinda
    Voigt, Maria
    Westbrook, John
    Rose, Yana
    Burley, Stephen K.
    BIOPHYSICAL JOURNAL, 2023, 122 (03) : 282A - 282A
  • [8] Exploring Experimental PDB Structures and Computed Structure Models from Artificial Intelligence/Machine Learning at RCSB Protein Data Bank (RCSB.org)
    Zardecki, Christine
    Craig, Paul
    Burley, Stephen
    JOURNAL OF BIOLOGICAL CHEMISTRY, 2023, 299 (03) : S210 - S210
  • [9] Training Opportunities: Exploring Experimental PDB Structures and Computed Structure Models from Artificial Intelligence/Machine Learning at RCSB Protein Data Bank (RCSB.org)
    Zardecki, Christine
    Burley, Stephen
    JOURNAL OF BIOLOGICAL CHEMISTRY, 2024, 300 (03) : S5 - S5