nablaDFT: Large-Scale Conformational Energy and Hamiltonian Prediction benchmark and dataset

被引:9
|
作者
Khrabrov, Kuzma [1 ]
Shenbin, Ilya [3 ]
Ryabov, Alexander [4 ,5 ]
Tsypin, Artem [1 ]
Telepov, Alexander [1 ]
Alekseev, Anton [3 ,7 ]
Grishin, Alexander [1 ]
Strashnov, Pavel [1 ]
Zhilyaev, Petr [4 ]
Nikolenko, Sergey [3 ,6 ]
Kadurin, Artur [1 ,2 ]
机构
[1] AIRI, Kutuzovskiy Prospect House 32 Bldg K1, Moscow 121170, Russia
[2] Kuban State Univ, Stavropolskaya St 149, Krasnodar 350040, Russia
[3] Russian Acad Sci, Steklov Math Inst, St Petersburg Dept, Nab R Fontanki 27, St Petersburg 191011, Russia
[4] Skolkovo Inst Sci & Technol, Ctr Mat Technol, Bolshoy Blvd 30,Bld 1, Moscow 121205, Russia
[5] Natl Res Univ, Moscow Inst Phys & Technol, Inst Sky Lane 9, Dolgoprudnyi 141700, Moscow Region, Russia
[6] ISP RAS Res Ctr Trusted Artificial Intelligence, Alexander Solzhenitsyn St 25, Moscow 109004, Russia
[7] St Petersburg Univ, 7-9 Univ Skaya Embankment, St Petersburg 199034, Russia
关键词
CHEMICAL UNIVERSE; DENSITY FUNCTIONALS; VIRTUAL EXPLORATION; ACCURATE; SYSTEMS;
D O I
10.1039/d2cp03966d
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Electronic wave function calculation is a fundamental task of computational quantum chemistry. Knowledge of the wave function parameters allows one to compute physical and chemical properties of molecules and materials. Unfortunately, it is infeasible to compute the wave functions analytically even for simple molecules. Classical quantum chemistry approaches such as the Hartree-Fock method or density functional theory (DFT) allow to compute an approximation of the wave function but are very computationally expensive. One way to lower the computational complexity is to use machine learning models that can provide sufficiently good approximations at a much lower computational cost. In this work we: (1) introduce a new curated large-scale dataset of electron structures of drug-like molecules, (2) establish a novel benchmark for the estimation of molecular properties in the multi-molecule setting, and (3) evaluate a wide range of methods with this benchmark. We show that the accuracy of recently developed machine learning models deteriorates significantly when switching from the single-molecule to the multi-molecule setting. We also show that these models lack generalization over different chemistry classes. In addition, we provide experimental evidence that larger datasets lead to better ML models in the field of quantum chemistry.
引用
收藏
页码:25853 / 25863
页数:11
相关论文
共 50 条
  • [21] WHU-OHS: A benchmark dataset for large-scale Hersepctral Image classification
    Li, Jiayi
    Huang, Xin
    Tu, Lilin
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2022, 113
  • [22] NetBench: A Large-Scale and Comprehensive Network Traffic Benchmark Dataset for Foundation Models
    Qian, Chen
    Li, Xiaochang
    Wang, Qineng
    Zhou, Gang
    Shao, Huajie
    PROCEEDINGS 2024 IEEE INTERNATIONAL WORKSHOP ON FOUNDATION MODELS FOR CYBER-PHYSICAL SYSTEMS & INTERNET OF THINGS, FMSYS 2024, 2024, : 20 - 25
  • [23] MultiScene: A Large-Scale Dataset and Benchmark for Multiscene Recognition in Single Aerial Images
    Hua, Yuansheng
    Mou, Lichao
    Jin, Pu
    Zhu, Xiao Xiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [24] DiTing: A large-scale Chinese seismic benchmark dataset for artificial intelligence in seismology
    Ming Zhao
    Zhuowei Xiao
    Shi Chen
    Lihua Fang
    Earthquake Science, 2023, (02) : 84 - 94
  • [25] IP102: A Large-Scale Benchmark Dataset for Insect Pest Recognition
    Wu, Xiaoping
    Zhan, Chi
    Lai, Yu-Kun
    Cheng, Ming-Ming
    Yang, Jufeng
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8779 - 8788
  • [26] MELAUDIS: A Large-Scale Benchmark Acoustic Dataset For Intelligent Transportation Systems Research
    Parineh, Hossein
    Sarvi, Majid
    Bagloee, Saeed Asadi
    SCIENTIFIC DATA, 2025, 12 (01)
  • [27] Vehicle Energy Dataset (VED), A Large-Scale Dataset for Vehicle Energy Consumption Research
    Oh, Geunseob
    Leblanc, David J.
    Peng, Huei
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (04) : 3302 - 3312
  • [28] UPAD: A Large-Scale Passive Sonar Benchmark Dataset for Vessel Detection and Classification
    Fischer, John
    Orescanin, Marko
    OCEANS 2024 - SINGAPORE, 2024,
  • [29] ParkScape: A Large-Scale Fisheye Dataset for Parking Slot Detection and a Benchmark Method
    Fu, Li
    Ma, Dongliang
    Qu, Xin
    Jiang, Xin
    Shan, Lie
    Zeng, Dan
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
  • [30] A Large-Scale Homography Benchmark
    Barath, Daniel
    Mishkin, Dmytro
    Polic, Michal
    Forstner, Wolfgang
    Matas, Jiri
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21360 - 21370