nablaDFT: Large-Scale Conformational Energy and Hamiltonian Prediction benchmark and dataset

被引:8
|
作者
Khrabrov, Kuzma [1 ]
Shenbin, Ilya [3 ]
Ryabov, Alexander [4 ,5 ]
Tsypin, Artem [1 ]
Telepov, Alexander [1 ]
Alekseev, Anton [3 ,7 ]
Grishin, Alexander [1 ]
Strashnov, Pavel [1 ]
Zhilyaev, Petr [4 ]
Nikolenko, Sergey [3 ,6 ]
Kadurin, Artur [1 ,2 ]
机构
[1] AIRI, Kutuzovskiy Prospect House 32 Bldg K1, Moscow 121170, Russia
[2] Kuban State Univ, Stavropolskaya St 149, Krasnodar 350040, Russia
[3] Russian Acad Sci, Steklov Math Inst, St Petersburg Dept, Nab R Fontanki 27, St Petersburg 191011, Russia
[4] Skolkovo Inst Sci & Technol, Ctr Mat Technol, Bolshoy Blvd 30,Bld 1, Moscow 121205, Russia
[5] Natl Res Univ, Moscow Inst Phys & Technol, Inst Sky Lane 9, Dolgoprudnyi 141700, Moscow Region, Russia
[6] ISP RAS Res Ctr Trusted Artificial Intelligence, Alexander Solzhenitsyn St 25, Moscow 109004, Russia
[7] St Petersburg Univ, 7-9 Univ Skaya Embankment, St Petersburg 199034, Russia
关键词
CHEMICAL UNIVERSE; DENSITY FUNCTIONALS; VIRTUAL EXPLORATION; ACCURATE; SYSTEMS;
D O I
10.1039/d2cp03966d
中图分类号
O64 [物理化学(理论化学)、化学物理学];
学科分类号
070304 ; 081704 ;
摘要
Electronic wave function calculation is a fundamental task of computational quantum chemistry. Knowledge of the wave function parameters allows one to compute physical and chemical properties of molecules and materials. Unfortunately, it is infeasible to compute the wave functions analytically even for simple molecules. Classical quantum chemistry approaches such as the Hartree-Fock method or density functional theory (DFT) allow to compute an approximation of the wave function but are very computationally expensive. One way to lower the computational complexity is to use machine learning models that can provide sufficiently good approximations at a much lower computational cost. In this work we: (1) introduce a new curated large-scale dataset of electron structures of drug-like molecules, (2) establish a novel benchmark for the estimation of molecular properties in the multi-molecule setting, and (3) evaluate a wide range of methods with this benchmark. We show that the accuracy of recently developed machine learning models deteriorates significantly when switching from the single-molecule to the multi-molecule setting. We also show that these models lack generalization over different chemistry classes. In addition, we provide experimental evidence that larger datasets lead to better ML models in the field of quantum chemistry.
引用
收藏
页码:25853 / 25863
页数:11
相关论文
共 50 条
  • [1] SDFC dataset: a large-scale benchmark dataset for hyperspectral image classification
    Sun, Liwei
    Zhang, Junjie
    Li, Jia
    Wang, Yueming
    Zeng, Dan
    OPTICAL AND QUANTUM ELECTRONICS, 2023, 55 (02)
  • [2] ClearPose: Large-scale Transparent Object Dataset and Benchmark
    Chen, Xiaotong
    Zhang, Huijie
    Yu, Zeren
    Opipari, Anthony
    Jenkins, Odest Chadwicke
    COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 381 - 396
  • [3] SDFC dataset: a large-scale benchmark dataset for hyperspectral image classification
    Liwei Sun
    Junjie Zhang
    Jia Li
    Yueming Wang
    Dan Zeng
    Optical and Quantum Electronics, 2023, 55
  • [4] LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting
    Liu, Xu
    Xia, Yutong
    Liang, Yuxuan
    Hu, Junfeng
    Wang, Yiwei
    Bai, Lei
    Huang, Chao
    Liu, Zhenguang
    Hooi, Bryan
    Zimmermann, Roger
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] FishNet: A Large-scale Dataset and Benchmark for Fish Recognition, Detection, and Functional Trait Prediction
    Khan, Faizan Farooq
    Li, Xiang
    Temple, Andrew J.
    Elhoseiny, Mohamed
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20439 - 20449
  • [6] Collaborative Camouflaged Object Detection: A Large-Scale Dataset and Benchmark
    Zhang, Cong
    Bi, Hongbo
    Xiang, Tian-Zhu
    Wu, Ranwan
    Tong, Jinghui
    Wang, Xiufang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (12) : 1 - 15
  • [7] A Large-scale Benchmark Dataset for Event Recognition in Surveillance Video
    Oh, Sangmin
    Hoogs, Anthony
    Perera, Amitha
    Cuntoor, Naresh
    Chen, Chia-Chih
    Lee, Jong Taek
    Mukherjee, Saurajit
    Aggarwal, J. K.
    Lee, Hyungtae
    Davis, Larry
    Swears, Eran
    Wang, Xioyang
    Ji, Qiang
    Reddy, Kishore
    Shah, Mubarak
    Vondrick, Carl
    Pirsiavash, Hamed
    Ramanan, Deva
    Yuen, Jenny
    Torralba, Antonio
    Song, Bi
    Fong, Anesco
    Roy-Chowdhury, Amit
    Desai, Mita
    2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011,
  • [8] Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender Systems
    Yuan, Guanghu
    Yuan, Fajie
    Li, Yudong
    Kong, Beibei
    Li, Shujie
    Chen, Lei
    Yang, Min
    Yu, Chenyun
    Hu, Bo
    Li, Zang
    Xu, Yu
    Qie, Xiaohu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [9] RGBT Salient Object Detection: A Large-Scale Dataset and Benchmark
    Tu, Zhengzheng
    Ma, Yan
    Li, Zhun
    Li, Chenglong
    Xu, Jieming
    Liu, Yongtao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4163 - 4176
  • [10] TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild
    Mueller, Matthias
    Bibi, Adel
    Giancola, Silvio
    Alsubaihi, Salman
    Ghanem, Bernard
    COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 310 - 327