nablaDFT: Large-Scale Conformational Energy and Hamiltonian Prediction benchmark and dataset

被引：9

作者：

Khrabrov, Kuzma ^{[1
]}

Shenbin, Ilya ^{[3
]}

Ryabov, Alexander ^{[4
,5
]}

Tsypin, Artem ^{[1
]}

Telepov, Alexander ^{[1
]}

Alekseev, Anton ^{[3
,7
]}

Grishin, Alexander ^{[1
]}

Strashnov, Pavel ^{[1
]}

Zhilyaev, Petr ^{[4
]}

Nikolenko, Sergey ^{[3
,6
]}

Kadurin, Artur ^{[1
,2
]}

机构：

[1] AIRI, Kutuzovskiy Prospect House 32 Bldg K1, Moscow 121170, Russia

[2] Kuban State Univ, Stavropolskaya St 149, Krasnodar 350040, Russia

[3] Russian Acad Sci, Steklov Math Inst, St Petersburg Dept, Nab R Fontanki 27, St Petersburg 191011, Russia

[4] Skolkovo Inst Sci & Technol, Ctr Mat Technol, Bolshoy Blvd 30,Bld 1, Moscow 121205, Russia

[5] Natl Res Univ, Moscow Inst Phys & Technol, Inst Sky Lane 9, Dolgoprudnyi 141700, Moscow Region, Russia

[6] ISP RAS Res Ctr Trusted Artificial Intelligence, Alexander Solzhenitsyn St 25, Moscow 109004, Russia

[7] St Petersburg Univ, 7-9 Univ Skaya Embankment, St Petersburg 199034, Russia

来源：

PHYSICAL CHEMISTRY CHEMICAL PHYSICS | 2022年 / 24卷 / 42期

关键词：

CHEMICAL UNIVERSE; DENSITY FUNCTIONALS; VIRTUAL EXPLORATION; ACCURATE; SYSTEMS;

D O I：

10.1039/d2cp03966d

中图分类号：

O64 [物理化学（理论化学）、化学物理学];

学科分类号：

070304 ; 081704 ;

摘要：

Electronic wave function calculation is a fundamental task of computational quantum chemistry. Knowledge of the wave function parameters allows one to compute physical and chemical properties of molecules and materials. Unfortunately, it is infeasible to compute the wave functions analytically even for simple molecules. Classical quantum chemistry approaches such as the Hartree-Fock method or density functional theory (DFT) allow to compute an approximation of the wave function but are very computationally expensive. One way to lower the computational complexity is to use machine learning models that can provide sufficiently good approximations at a much lower computational cost. In this work we: (1) introduce a new curated large-scale dataset of electron structures of drug-like molecules, (2) establish a novel benchmark for the estimation of molecular properties in the multi-molecule setting, and (3) evaluate a wide range of methods with this benchmark. We show that the accuracy of recently developed machine learning models deteriorates significantly when switching from the single-molecule to the multi-molecule setting. We also show that these models lack generalization over different chemistry classes. In addition, we provide experimental evidence that larger datasets lead to better ML models in the field of quantum chemistry.

引用

页码：25853 / 25863

页数：11

共 50 条

[31] A Large-Scale Benchmark Dataset for Anomaly Detection and Rare Event Classification for Audio Forensics
Abbasi, Ahmed
Javed, Abdul Rehman Rehman
Yasin, Amanullah
Jalil, Zunera
Kryvinska, Natalia
Tariq, Usman
IEEE ACCESS, 2022, 10 : 38885 - 38894
[32] A benchmark approach and dataset for large-scale lane mapping from MLS point clouds
Mi, Xiaoxin
Dong, Zhen
Cao, Zhipeng
Yang, Bisheng
Cao, Zhen
Zheng, Chao
Stoter, Jantien
Nan, Liangliang
INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 133
[33] MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition
Guo, Yandong
Zhang, Lei
Hu, Yuxiao
He, Xiaodong
Gao, Jianfeng
COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 : 87 - 102
[34] A Platform for Electrical Capacitance Tomography Large-scale Benchmark Dataset Generating and Image Reconstruction
Zheng, Jin
Peng, Lihui
2017 IEEE INTERNATIONAL CONFERENCE ON IMAGING SYSTEMS AND TECHNIQUES (IST), 2017, : 138 - 143
[35] EMS: A Large-Scale Eye Movement Dataset, Benchmark, and New Model for Schizophrenia Recognition
Song, Yingjie
Liu, Zhi
Li, Gongyang
Xie, Jiawei
Wu, Qiang
Zeng, Dan
Xu, Lihua
Zhang, Tianhong
Wang, Jijun
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[36] A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing
Zhang, Shifeng
Wang, Xiaobo
Liu, Ajian
Zhao, Chenxu
Wan, Jun
Escalera, Sergio
Shi, Hailin
Wang, Zezheng
Li, Stan Z.
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 919 - 928
[37] CSPC-Dataset: New LiDAR Point Cloud Dataset and Benchmark for Large-Scale Scene Semantic Segmentation
Tong, Guofeng
Li, Yong
Chen, Dong
Sun, Qi
Cao, Wei
Xiang, Guiqiu
IEEE ACCESS, 2020, 8 : 87695 - 87718
[38] OmniArt: A Large-scale Artistic Benchmark
Strezoski, Gjorgji
Worring, Marcel
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (04)
[39] DMDD: A Large-Scale Dataset for Dataset Mentions Detection
Pan, Huitong
Zhang, Qi
Dragut, Eduard
Caragea, Cornelia
Latecki, Longin Jan
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1132 - 1146
[40] Large-Scale Indoor Visual-Geometric Multimodal Dataset and Benchmark for Novel View Synthesis
Cao, Junming
Zhao, Xiting
Schwertfeger, Soren
SENSORS, 2024, 24 (17)

← 1 2 3 4 5 →