MISATO: machine learning dataset of protein-ligand complexes for structure-based drug discovery

被引:8
|
作者
Siebenmorgen, Till [1 ,2 ]
Menezes, Filipe [1 ,2 ]
Benassou, Sabrina [3 ]
Merdivan, Erinc [4 ]
Didi, Kieran [5 ]
Mourao, Andre Santos Dias [1 ,2 ]
Kitel, Radoslaw [6 ]
Lio, Pietro [5 ]
Kesselheim, Stefan [3 ]
Piraud, Marie [4 ]
Theis, Fabian J. [4 ,7 ,8 ]
Sattler, Michael [1 ,2 ]
Popowicz, Grzegorz M. [1 ,2 ]
机构
[1] Helmholtz Munich, Inst Struct Biol, Mol Targets & Therapeut Ctr, Neuherberg, Germany
[2] Tech Univ Munich, Bayer NMR Zent, TUM Sch Nat Sci, Dept Biosci, Garching, Germany
[3] Forschungszentrum Julich, Julich Supercomp Ctr, Julich, Germany
[4] Helmholtz Munich, Helmholtz AI, Neuherberg, Germany
[5] Univ Cambridge, Comp Lab, Cambridge, England
[6] Jagiellonian Univ, Fac Chem, Krakow, Poland
[7] Helmholtz Munich, Inst Computat Biol, Computat Hlth Ctr, Neuherberg, Germany
[8] Tech Univ Munich, TUM Sch Computat Informat & Technol, Garching, Germany
来源
NATURE COMPUTATIONAL SCIENCE | 2024年 / 4卷 / 05期
关键词
SCORING FUNCTION; FORCE-FIELD; BINDING; AFFINITY; EFFICIENT; MODELS; PARAMETERIZATION; GENERATION; PREDICTION; ACCURACY;
D O I
10.1038/s43588-024-00627-2
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Large language models have greatly enhanced our ability to understand biology and chemistry, yet robust methods for structure-based drug discovery, quantum chemistry and structural biology are still sparse. Precise biomolecule-ligand interaction datasets are urgently needed for large language models. To address this, we present MISATO, a dataset that combines quantum mechanical properties of small molecules and associated molecular dynamics simulations of similar to 20,000 experimental protein-ligand complexes with extensive validation of experimental data. Starting from the existing experimental structures, semi-empirical quantum mechanics was used to systematically refine these structures. A large collection of molecular dynamics traces of protein-ligand complexes in explicit water is included, accumulating over 170 mu s. We give examples of machine learning (ML) baseline models proving an improvement of accuracy by employing our data. An easy entry point for ML experts is provided to enable the next generation of drug discovery artificial intelligence models.
引用
收藏
页码:367 / 378
页数:14
相关论文
共 50 条
  • [1] Characterising protein-ligand binding in support of structure-based drug discovery
    Murray, James
    Hubbard, Roderick E.
    ACTA CRYSTALLOGRAPHICA A-FOUNDATION AND ADVANCES, 2008, 64 : C122 - C122
  • [2] Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction
    Zhang, Yunjiang
    Li, Shuyuan
    Meng, Kong
    Sun, Shaorui
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (05) : 1456 - 1472
  • [3] Correction to "Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction"
    Zhang, Yunjiang
    Li, Shuyuan
    Meng, Kong
    Sun, Shaorui
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (19) : 7826 - 7826
  • [4] Automated protein-ligand crystallography for structure-based drug design
    Mooij, Wijnand T. M.
    Hartshorn, Michael J.
    Tickle, Ian J.
    Sharff, Andrew J.
    Verdonk, Marcel L.
    Jhoti, Harren
    CHEMMEDCHEM, 2006, 1 (08) : 827 - 838
  • [5] Molecular Docking for Drug Discovery: Machine-Learning Approaches for Native Pose Prediction of Protein-Ligand Complexes
    Ashtawy, Hossam M.
    Mahapatra, Nihar R.
    COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS: 10TH INTERNATIONAL MEETING, 2014, 8452 : 15 - 32
  • [6] PharmRF: A machine-learning scoring function to identify the best protein-ligand complexes for structure-based pharmacophore screening with high enrichments
    Kumar, Sivakumar Prasanth
    Dixit, Nandan Y.
    Patel, Chirag N.
    Rawal, Rakesh M.
    Pandya, Himanshu A.
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2022, 43 (12) : 847 - 863
  • [7] A multidimensional dataset for structure-based machine learning
    Holcomb, Matthew
    Forli, Stefano
    NATURE COMPUTATIONAL SCIENCE, 2024, 4 (05): : 318 - 319
  • [8] Visualizing structure-based deep learning scoring functions for protein-ligand interactions
    Koes, David
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2019, 258
  • [9] PocketAnchor: Learning structure-based pocket representations for protein-ligand interaction prediction
    Li, Shuya
    Tian, Tingzhong
    Zhang, Ziting
    Zou, Ziheng
    Zhao, Dan
    Zeng, Jianyang
    CELL SYSTEMS, 2023, 14 (08) : 692 - +
  • [10] Structure-Based Drug Screening and Ligand-Based Drug Screening with Machine Learning
    Fukunishi, Yoshifumi
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2009, 12 (04) : 397 - 408