SMICLR: Contrastive Learning on Multiple Molecular Representations for Semisupervised and Unsupervised Representation Learning

被引:17
|
作者
Pinheiro, Gabriel A. [1 ]
Silva, Juarez L. F. [2 ]
Quiles, Marcos G. [1 ]
机构
[1] Fed Univ Sao Paulo Unifesp, Inst Sci & Technol, BR-12247014 Sao Jose Dos Campos, SP, Brazil
[2] Univ Sao Paulo, Sao Carlos Inst Chem, BR-13560970 Sao Carlos, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
PREDICTION; NETWORKS; LANGUAGE; MODELS; SMILES;
D O I
10.1021/acs.jcim.2c00521
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Machine learning as a tool for chemical space exploration broadens horizons to work with known and unknown molecules. At its core lies molecular representation, an essential key to improve learning about structure-property relationships. Recently, contrastive frameworks have been showing impressive results for representation learning in diverse domains. Therefore, this paper proposes a contrastive framework that embraces multimodal molecular data. Specifically, our approach jointly trains a graph encoder and an encoder for the simplified molecular-input line-entry system (SMILES) string to perform the contrastive learning objective. Since SMILES is the basis of our method, i.e., we built the molecular graph from the SMILES, we call our framework as SMILES Contrastive Learning (SMICLR). When stacking a nonlinear regressor on the SMICLR's pretrained encoder and fine-tuning the entire model, we reduced the prediction error by, on average, 44% and 25% for the energetic and electronic properties of the QM9 data set, respectively, over the supervised baseline. We further improved our framework's performance when applying data augmentations in each molecular-input representation. Moreover, SMICLR demonstrated competitive representation learning results in an unsupervised setting.
引用
收藏
页码:3948 / 3960
页数:13
相关论文
共 50 条
  • [1] Learning to Perturb for Contrastive Learning of Unsupervised Sentence Representations
    Zhou, Kun
    Zhou, Yuanhang
    Zhao, Wayne Xin
    Wen, Ji-Rong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3935 - 3944
  • [2] Kalman contrastive unsupervised representation learning
    Mohammad Mahdi Jahani Yekta
    Scientific Reports, 14 (1)
  • [3] CURL: Contrastive Unsupervised Representations for Reinforcement Learning
    Laskin, Michael
    Srinivas, Aravind
    Abbeel, Pieter
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [4] Debiased Contrastive Learning of Unsupervised Sentence Representations
    Zhou, Kun
    Zhang, Beichen
    Zhao, Wayne Xin
    Wen, Ji-Rong
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6120 - 6130
  • [5] UNSUPERVISED CONTRASTIVE LEARNING OF SOUND EVENT REPRESENTATIONS
    Fonseca, Eduardo
    Ortego, Diego
    McGuinness, Kevin
    O'Connor, Noel E.
    Serra, Xavier
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 371 - 375
  • [6] A Theoretical Analysis of Contrastive Unsupervised Representation Learning
    Arora, Sanjeev
    Khandeparkar, Hrishikesh
    Khodak, Mikhail
    Plevrakis, Orestis
    Saunshi, Nikunj
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [7] Relative Contrastive Loss for Unsupervised Representation Learning
    Tang, Shixiang
    Zhu, Feng
    Bai, Lei
    Zhao, Rui
    Ouyang, Wanli
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 1 - 18
  • [8] Semisupervised Representation Contrastive Learning for Massive MIMO Fingerprint Positioning
    Gong, Xinrui
    Lu, An-An
    Fu, Xiao
    Liu, Xiaofeng
    Gao, Xiqi
    Xia, Xiang-Gen
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (08): : 14870 - 14885
  • [9] Semisupervised Machine Fault Diagnosis Fusing Unsupervised Graph Contrastive Learning
    Yang, Chaoying
    Liu, Jie
    Zhou, Kaibo
    Jiang, Xingxing
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (08) : 8644 - 8653
  • [10] DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations
    Giorgi, John
    Nitski, Osvald
    Wang, Bo
    Bader, Gary
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 879 - 895