MoleculeNet: a benchmark for molecular machine learning

被引:1338
|
作者
Wu, Zhenqin [1 ]
Ramsundar, Bharath [2 ]
Feinberg, Evan N. [3 ]
Gomes, Joseph [1 ]
Geniesse, Caleb [3 ]
Pappu, Aneesh S. [2 ]
Leswing, Karl [4 ]
Pande, Vijay [1 ]
机构
[1] Stanford Univ, Dept Chem, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[3] Stanford Sch Med, Program Biophys, Stanford, CA 94305 USA
[4] Schrodinger Inc, New York, NY USA
关键词
NEURAL-NETWORKS; AQUEOUS SOLUBILITY; PDBBIND DATABASE; FREE-ENERGIES; PREDICTION; CHEMOINFORMATICS; VALIDATION; COLLECTION; DRUGS;
D O I
10.1039/c7sc02664a
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.
引用
收藏
页码:513 / 530
页数:18
相关论文
共 50 条
  • [21] Coupled Cluster Molecular Dynamics of Condensed Phase Systems Enabled by Machine Learning Potentials: Liquid Water Benchmark
    Daru, Janos
    Forbert, Harald
    Behler, Joerg
    Marx, Dominik
    [J]. PHYSICAL REVIEW LETTERS, 2022, 129 (22)
  • [22] PMLB: a large benchmark suite for machine learning evaluation and comparison
    Randal S. Olson
    William La Cava
    Patryk Orzechowski
    Ryan J. Urbanowicz
    Jason H. Moore
    [J]. BioData Mining, 10
  • [23] Bugs in machine learning-based systems: a faultload benchmark
    Mohammad Mehdi Morovati
    Amin Nikanjam
    Foutse Khomh
    Zhen Ming (Jack) Jiang
    [J]. Empirical Software Engineering, 2023, 28
  • [24] AIPerf: Automated Machine Learning as an AI-HPC Benchmark
    Ren, Zhixiang
    Liu, Yongheng
    Shi, Tianhui
    Xie, Lei
    Zhou, Yue
    Zhai, Jidong
    Zhang, Youhui
    Zhang, Yunquan
    Chen, Wenguang
    [J]. BIG DATA MINING AND ANALYTICS, 2021, 4 (03) : 208 - 220
  • [25] AIPerf: Automated Machine Learning as an AI-HPC Benchmark
    Zhixiang Ren
    Yongheng Liu
    Tianhui Shi
    Lei Xie
    Yue Zhou
    Jidong Zhai
    Youhui Zhang
    Yunquan Zhang
    Wenguang Chen
    [J]. Big Data Mining and Analytics, 2021, 4 (03) : 208 - 220
  • [26] Machine Learning in GNSS Multipath/NLOS Mitigation: Review and Benchmark
    Xu, Penghui
    Zhang, Guohao
    Yang, Bo
    Hsu, Li-Ta
    [J]. IEEE AEROSPACE AND ELECTRONIC SYSTEMS MAGAZINE, 2024, 39 (09) : 35 - 44
  • [27] Internet of Things Cybersecurity Platform Benchmark: A Machine Learning Assessment
    Craciun, Robert-Alexandru
    Pietraru, Radu Nicolae
    Moisescu, Mihnea Alexandru
    [J]. CONTROL ENGINEERING AND APPLIED INFORMATICS, 2024, 26 (03): : 12 - 20
  • [28] PMLB: a large benchmark suite for machine learning evaluation and comparison
    Olson, Randal S.
    La Cava, William
    Orzechowski, Patryk
    Urbanowicz, Ryan J.
    Moore, Jason H.
    [J]. BIODATA MINING, 2017, 10
  • [29] Machine Learning in Pansharpening: A Benchmark, From Shallow to Deep Networks
    Deng, Liang-Jian
    Vivone, Gemine
    Paoletti, Mercedes
    Scarpa, Giuseppe
    He, Jiang
    Zhang, Yongjun
    Chanussot, Jocelyn
    Plaza, Antonio J.
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING MAGAZINE, 2022, 10 (03) : 279 - 315
  • [30] Bugs in machine learning-based systems: a faultload benchmark
    Morovati, Mohammad Mehdi
    Nikanjam, Amin
    Khomh, Foutse
    Jiang, Zhen Ming
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (03)