An Unsupervised Machine Learning Approach for the Automatic Construction of Local Chemical Descriptors

被引:3
|
作者
Gallegos, Miguel [1 ]
Isamura, Bienfait Kabuyaya [2 ]
Popelier, Paul L. A. [2 ]
Pendas, Angel Martin [1 ]
机构
[1] Univ Oviedo, Dept Analyt & Phys Chem, E-33006 Oviedo, Spain
[2] Univ Manchester, Dept Chem, Manchester M13 9PL, England
基金
英国科研创新办公室; 欧洲研究理事会;
关键词
MOLECULAR DESCRIPTORS; POTENTIALS; INSIGHTS; FEATURES; MODELS; QSAR;
D O I
10.1021/acs.jcim.3c01906
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Condensing the many physical variables defining a chemical system into a fixed-size array poses a significant challenge in the development of chemical Machine Learning (ML). Atom Centered Symmetry Functions (ACSFs) offer an intuitive featurization approach by means of a tedious and labor-intensive selection of tunable parameters. In this work, we implement an unsupervised ML strategy relying on a Gaussian Mixture Model (GMM) to automatically optimize the ACSF parameters. GMMs effortlessly decompose the vastness of the chemical and conformational spaces into well-defined radial and angular clusters, which are then used to build tailor-made ACSFs. The unsupervised exploration of the space has demonstrated general applicability across a diverse range of systems, spanning from various unimolecular landscapes to heterogeneous databases. The impact of the sampling technique and temperature on space exploration is also addressed, highlighting the particularly advantageous role of high-temperature Molecular Dynamics (MD) simulations. The reliability of the resulting features is assessed through the estimation of the atomic charges of a prototypical capped amino acid and a heterogeneous collection of CHON molecules. The automatically constructed ACSFs serve as high-quality descriptors, consistently yielding typical prediction errors below 0.010 electrons bound for the reported atomic charges. Altering the spatial distribution of the functions with respect to the cluster highlights the critical role of symmetry rupture in achieving significantly improved features. More specifically, using two separate functions to describe the lower and upper tails of the cluster results in the best performing models with errors as low as 0.006 electrons. Finally, the effectiveness of finely tuned features was checked across different architectures, unveiling the superior performance of Gaussian Process (GP) models over Feed Forward Neural Networks (FFNNs), particularly in low-data regimes, with nearly a 2-fold increase in prediction quality. Altogether, this approach paves the way toward an easier construction of local chemical descriptors, while providing valuable insights into how radial and angular spaces should be mapped. Finally, this work opens the possibility of encoding many-body information beyond angular terms into upcoming ML features.
引用
收藏
页码:3059 / 3079
页数:21
相关论文
共 50 条
  • [41] AUTOMATIC THESAURUS CONSTRUCTION BY MACHINE LEARNING FROM RETRIEVAL SESSIONS
    GUNTZER, U
    JUTTNER, G
    SEEGMULLER, G
    SARRE, F
    INFORMATION PROCESSING & MANAGEMENT, 1989, 25 (03) : 265 - 273
  • [42] Automatic aerospace weld inspection using unsupervised local deep feature learning
    Dong, Xinghui
    Taylor, Chris J.
    Cootes, Tim F.
    KNOWLEDGE-BASED SYSTEMS, 2021, 221
  • [43] UNSUPERVISED LEARNING APPROACH TO FEATURE ANALYSIS FOR AUTOMATIC SPEECH EMOTION RECOGNITION
    Eskimez, Sefik Emre
    Duan, Zhiyao
    Heinzelman, Wendi
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5099 - 5103
  • [44] Machine learning glass transition temperature of polyacrylamides using quantum chemical descriptors
    Zhang, Yun
    Xu, Xiaojie
    POLYMER CHEMISTRY, 2021, 12 (06) : 843 - 851
  • [45] Evaluation of Local Descriptors for Automatic Image Annotation
    Lenc, Ladislav
    ICAART: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2, 2017, : 527 - 534
  • [46] SeqSeg: Learning Local Segments for Automatic Vascular Model Construction
    Cepero, Numi Sveinsson
    Shadden, Shawn C.
    ANNALS OF BIOMEDICAL ENGINEERING, 2025, 53 (01) : 158 - 179
  • [47] Unsupervised machine learning for identifying key risk factors contributing to construction delays
    Al-Bataineh, Fuad
    Khatatbeh, Ahmed Ali
    Alzubi, Yazan
    ORGANIZATION TECHNOLOGY AND MANAGEMENT IN CONSTRUCTION, 2024, 16 (01): : 170 - 185
  • [48] Unsupervised Extreme Learning Machine via Structured Graph Construction for Data Clustering
    Zhang, Leijie
    Zhu, Xin
    Peng, Yong
    Kong, Wanzeng
    2018 11TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2018, : 33 - 36
  • [49] Unsupervised Learning Based on Multiple Descriptors for WSIs Diagnosis
    Sheikh, Taimoor Shakeel
    Kim, Jee-Yeon
    Shim, Jaesool
    Cho, Migyung
    DIAGNOSTICS, 2022, 12 (06)
  • [50] A machine learning approach to automatic music genre classification
    Silla, Carlos N.
    Koerich, Alessandro L.
    Kaestner, Celso A. A.
    Journal of the Brazilian Computer Society, 2008, 14 (03) : 7 - 18