MoleculeNet: a benchmark for molecular machine learning

被引:1338
|
作者
Wu, Zhenqin [1 ]
Ramsundar, Bharath [2 ]
Feinberg, Evan N. [3 ]
Gomes, Joseph [1 ]
Geniesse, Caleb [3 ]
Pappu, Aneesh S. [2 ]
Leswing, Karl [4 ]
Pande, Vijay [1 ]
机构
[1] Stanford Univ, Dept Chem, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
[3] Stanford Sch Med, Program Biophys, Stanford, CA 94305 USA
[4] Schrodinger Inc, New York, NY USA
关键词
NEURAL-NETWORKS; AQUEOUS SOLUBILITY; PDBBIND DATABASE; FREE-ENERGIES; PREDICTION; CHEMOINFORMATICS; VALIDATION; COLLECTION; DRUGS;
D O I
10.1039/c7sc02664a
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.
引用
收藏
页码:513 / 530
页数:18
相关论文
共 50 条
  • [31] MLPerf: An Industry Standard Benchmark Suite for Machine Learning Performance
    Mattson, Peter
    Tang, Hanlin
    Wei, Gu-Yeon
    Wu, Carole-Jean
    Reddi, Vijay Janapa
    Cheng, Christine
    Coleman, Cody
    Diamos, Greg
    Kanter, David
    Micikevicius, Paulius
    Patterson, David
    Schmuelling, Guenther
    [J]. IEEE MICRO, 2020, 40 (02) : 8 - 16
  • [32] Wireless Network Simulation to Create Machine Learning Benchmark Data
    Katzef, Marc
    Cullen, Andrew C.
    Alpcan, Tansu
    Leckie, Christopher
    Kopacz, Justin
    [J]. 2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, : 6378 - 6383
  • [33] A benchmark study of machine learning methods for molecular electronic transition: Tree-based ensemble learning versus graph neural network
    Kang, Beomchang
    Seok, Chaok
    Lee, Juyong
    [J]. BULLETIN OF THE KOREAN CHEMICAL SOCIETY, 2022, 43 (03) : 328 - 335
  • [34] Machine learning for molecular thermodynamics
    Jiaqi Ding
    Nan Xu
    Manh Tien Nguyen
    Qi Qiao
    Yao Shi
    Yi He
    Qing Shao
    [J]. Chinese Journal of Chemical Engineering, 2021, 31 (03) : 227 - 239
  • [35] Machine learning in molecular ecology
    Fountain-Jones, Nicholas M.
    Smith, Megan L.
    Austerlitz, Frederic
    [J]. MOLECULAR ECOLOGY RESOURCES, 2021, 21 (08) : 2589 - 2597
  • [36] Machine Learning for Molecular Simulation
    Noe, Frank
    Tkatchenko, Alexandre
    Mueller, Klaus-Robert
    Clementi, Cecilia
    [J]. ANNUAL REVIEW OF PHYSICAL CHEMISTRY, VOL 71, 2020, 71 : 361 - 390
  • [37] Machine learning for molecular properties
    Tretiak, Sergei
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2019, 257
  • [38] Molecular machine learning with DeepChem
    Ramsundar, Bharath
    Leswing, Karl
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2019, 257
  • [39] Molecular machine learning with DeepChem
    Ramsundar, Bharath
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2018, 255
  • [40] Machine learning for molecular thermodynamics
    Ding, Jiaqi
    Xu, Nan
    Manh Tien Nguyen
    Qiao, Qi
    Shi, Yao
    He, Yi
    Shao, Qing
    [J]. CHINESE JOURNAL OF CHEMICAL ENGINEERING, 2021, 31 (31) : 227 - 239