SMICLR: Contrastive Learning on Multiple Molecular Representations for Semisupervised and Unsupervised Representation Learning

被引:17
|
作者
Pinheiro, Gabriel A. [1 ]
Silva, Juarez L. F. [2 ]
Quiles, Marcos G. [1 ]
机构
[1] Fed Univ Sao Paulo Unifesp, Inst Sci & Technol, BR-12247014 Sao Jose Dos Campos, SP, Brazil
[2] Univ Sao Paulo, Sao Carlos Inst Chem, BR-13560970 Sao Carlos, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
PREDICTION; NETWORKS; LANGUAGE; MODELS; SMILES;
D O I
10.1021/acs.jcim.2c00521
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Machine learning as a tool for chemical space exploration broadens horizons to work with known and unknown molecules. At its core lies molecular representation, an essential key to improve learning about structure-property relationships. Recently, contrastive frameworks have been showing impressive results for representation learning in diverse domains. Therefore, this paper proposes a contrastive framework that embraces multimodal molecular data. Specifically, our approach jointly trains a graph encoder and an encoder for the simplified molecular-input line-entry system (SMILES) string to perform the contrastive learning objective. Since SMILES is the basis of our method, i.e., we built the molecular graph from the SMILES, we call our framework as SMILES Contrastive Learning (SMICLR). When stacking a nonlinear regressor on the SMICLR's pretrained encoder and fine-tuning the entire model, we reduced the prediction error by, on average, 44% and 25% for the energetic and electronic properties of the QM9 data set, respectively, over the supervised baseline. We further improved our framework's performance when applying data augmentations in each molecular-input representation. Moreover, SMICLR demonstrated competitive representation learning results in an unsupervised setting.
引用
收藏
页码:3948 / 3960
页数:13
相关论文
共 50 条
  • [31] Semisupervised Learning for Noise Suppression Using Deep Reinforcement Learning of Contrastive Features
    Kazemi, Ehsan
    Taherkhani, Fariborz
    Wang, Liqiang
    IEEE SENSORS LETTERS, 2023, 7 (04)
  • [32] Contrastive Code Representation Learning
    Jain, Paras
    Jain, Ajay
    Zhang, Tianjun
    Abbeel, Pieter
    Gonzalez, Joseph E.
    Stoica, Ion
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5954 - 5971
  • [33] Space-correlated Contrastive Representation Learning with Multiple Instances
    Song, Danming
    Gao, Yipeng
    Yan, Junkai
    Sun, Wei
    Zheng, Wei-Shi
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4715 - 4721
  • [34] Unsupervised Contrastive Learning of Image Representations from Ultrasound Videos with Hard Negative Mining
    Basu, Soumen
    Singla, Somanshu
    Gupta, Mayank
    Rana, Pratyaksha
    Gupta, Pankaj
    Arora, Chetan
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT IV, 2022, 13434 : 423 - 433
  • [35] Unsupervised learning of invariant representations
    Anselmi, Fabio
    Leibo, Joel Z.
    Rosasco, Lorenzo
    Mutch, Jim
    Tacchetti, Andrea
    Poggio, Tomaso
    THEORETICAL COMPUTER SCIENCE, 2016, 633 : 112 - 121
  • [36] Unsupervised Learning of Face Representations
    Datta, Samyak
    Sharma, Gaurav
    Jawahar, C. V.
    PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, : 135 - 142
  • [37] Models for unsupervised learning of representations
    Garionis, R
    8TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING, VOLS 1-3, PROCEEDING, 2001, : 253 - 258
  • [38] Flexible Affinity Matrix Learning for Unsupervised and Semisupervised Classification
    Fang, Xiaozhao
    Han, Na
    Wong, Wai Keung
    Teng, Shaohua
    Wu, Jigang
    Xie, Shengli
    Li, Xuelong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (04) : 1133 - 1149
  • [39] Learning Informative Health Indicators Through Unsupervised Contrastive Learning
    Rombach, Katharina
    Michau, Gabriel
    Burzle, Wilfried
    Koller, Stefan
    Fink, Olga
    IEEE TRANSACTIONS ON RELIABILITY, 2024, : 1 - 13
  • [40] Enriched Music Representations With Multiple Cross-Modal Contrastive Learning
    Ferraro, Andres
    Favory, Xavier
    Drossos, Konstantinos
    Kim, Yuntae
    Bogdanov, Dmitry
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 733 - 737