An Unsupervised Machine Learning Approach for the Automatic Construction of Local Chemical Descriptors

被引:3
|
作者
Gallegos, Miguel [1 ]
Isamura, Bienfait Kabuyaya [2 ]
Popelier, Paul L. A. [2 ]
Pendas, Angel Martin [1 ]
机构
[1] Univ Oviedo, Dept Analyt & Phys Chem, E-33006 Oviedo, Spain
[2] Univ Manchester, Dept Chem, Manchester M13 9PL, England
基金
英国科研创新办公室; 欧洲研究理事会;
关键词
MOLECULAR DESCRIPTORS; POTENTIALS; INSIGHTS; FEATURES; MODELS; QSAR;
D O I
10.1021/acs.jcim.3c01906
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Condensing the many physical variables defining a chemical system into a fixed-size array poses a significant challenge in the development of chemical Machine Learning (ML). Atom Centered Symmetry Functions (ACSFs) offer an intuitive featurization approach by means of a tedious and labor-intensive selection of tunable parameters. In this work, we implement an unsupervised ML strategy relying on a Gaussian Mixture Model (GMM) to automatically optimize the ACSF parameters. GMMs effortlessly decompose the vastness of the chemical and conformational spaces into well-defined radial and angular clusters, which are then used to build tailor-made ACSFs. The unsupervised exploration of the space has demonstrated general applicability across a diverse range of systems, spanning from various unimolecular landscapes to heterogeneous databases. The impact of the sampling technique and temperature on space exploration is also addressed, highlighting the particularly advantageous role of high-temperature Molecular Dynamics (MD) simulations. The reliability of the resulting features is assessed through the estimation of the atomic charges of a prototypical capped amino acid and a heterogeneous collection of CHON molecules. The automatically constructed ACSFs serve as high-quality descriptors, consistently yielding typical prediction errors below 0.010 electrons bound for the reported atomic charges. Altering the spatial distribution of the functions with respect to the cluster highlights the critical role of symmetry rupture in achieving significantly improved features. More specifically, using two separate functions to describe the lower and upper tails of the cluster results in the best performing models with errors as low as 0.006 electrons. Finally, the effectiveness of finely tuned features was checked across different architectures, unveiling the superior performance of Gaussian Process (GP) models over Feed Forward Neural Networks (FFNNs), particularly in low-data regimes, with nearly a 2-fold increase in prediction quality. Altogether, this approach paves the way toward an easier construction of local chemical descriptors, while providing valuable insights into how radial and angular spaces should be mapped. Finally, this work opens the possibility of encoding many-body information beyond angular terms into upcoming ML features.
引用
收藏
页码:3059 / 3079
页数:21
相关论文
共 50 条
  • [21] Automatic construction of inlining heuristics using machine learning
    Kulkarni, Sameer
    Cavazos, John
    Wimmer, Christian
    Simon, Douglas
    Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2013, 2013,
  • [22] Automatic Construction of Inlining Heuristics using Machine Learning
    Kulkarni, Sameer
    Cavazos, John
    Wimmer, Christian
    Simon, Douglas
    PROCEEDINGS OF THE 2013 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2013, : 280 - 291
  • [23] An Unsupervised Feature Learning Approach to Improve Automatic Incident Detection
    Ren, Jimmy S. J.
    Wang, Wei
    Wang, Jiawei
    Liao, Stephen
    2012 15TH INTERNATIONAL IEEE CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2012, : 172 - 177
  • [24] Quantum Chemical Roots of Machine-Learning Molecular Similarity Descriptors
    Gugler, Stefan
    Reiher, Markus
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2022, : 6670 - 6689
  • [25] Unsupervised Deep Learning of Compact Binary Descriptors
    Lin, Kevin
    Lu, Jiwen
    Chen, Chu-Song
    Zhou, Jie
    Sun, Ming-Ting
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (06) : 1501 - 1514
  • [26] A machine learning approach to automatic stroke segmentation
    Herold, James
    Stahovich, Thomas F.
    COMPUTERS & GRAPHICS-UK, 2014, 38 : 357 - 364
  • [27] Automatic classification of magnetocardiograms with the machine learning approach
    Fenici, R
    Brisinda, D
    Meloni, AM
    Fenici, P
    EUROPEAN HEART JOURNAL, 2004, 25 : 560 - 560
  • [28] Machine learning approach for automatic document summarization
    School of Computer and Systems Sciences , Jawaharlal Nehru University , New Delhi, India
    不详
    World Acad. Sci. Eng. Technol., 2009, (103-109):
  • [29] AUTOMATIC ORDERING OF SUBGOALS - A MACHINE LEARNING APPROACH
    MARKOVITCH, S
    SCOTT, PD
    LOGIC PROGRAMMING : PROCEEDINGS OF THE NORTH AMERICAN CONFERENCE, 1989, VOL 1-2, 1989, : 224 - 240
  • [30] A machine learning approach for automatic performance of a trumpet
    Bilitski, J
    ICCIMA 2005: Sixth International Conference on Computational Intelligence and Multimedia Applications, Proceedings, 2005, : 80 - 85