Multilingual Molecular Representation Learning via Contrastive Pre-training

被引:0
|
作者
Guo, Zhihui [1 ]
Sharma, Pramod [1 ]
Martinez, Andy [1 ]
Du, Liang [1 ]
Abraham, Robin [1 ]
机构
[1] Microsoft Corp, Redmond, WA 98052 USA
关键词
DESCRIPTORS; SIMILARITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Molecular representation learning plays an essential role in cheminformatics. Recently, language model-based approaches have gained popularity as an alternative to traditional expert-designed features to encode molecules. However, these approaches only utilize a single molecular language for representation learning. Motivated by the fact that a given molecule can be described using different languages such as Simplified Molecular Line Entry System (SMILES), the International Union of Pure and Applied Chemistry (IUPAC), and the IUPAC International Chemical Identifier (InChI), we propose a multilingual molecular embedding generation approach called MM-Deacon (multilingual molecular domain embedding analysis via contrastive learning). MM-Deacon is pre-trained using SMILES and IUPAC as two different languages on large-scale molecules. We evaluated the robustness of our method on seven molecular property prediction tasks from MoleculeNet benchmark, zero-shot cross-lingual retrieval, and a drug-drug interaction prediction task.
引用
收藏
页码:3441 / 3453
页数:13
相关论文
共 50 条
  • [21] Pre-training Strategies and Datasets for Facial Representation Learning
    Bulat, Adrian
    Cheng, Shiyang
    Yang, Jing
    Garbett, Andrew
    Sanchez, Enrique
    Tzimiropoulos, Georgios
    COMPUTER VISION, ECCV 2022, PT XIII, 2022, 13673 : 107 - 125
  • [22] Multilingual Representation Distillation with Contrastive Learning
    Tan, Weiting
    Heffernan, Kevin
    Schwenk, Holger
    Koehn, Philipp
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1477 - 1490
  • [23] Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
    Lei, Chenyi
    Luo, Shixian
    Liu, Yong
    He, Wanggui
    Wang, Jiamang
    Wang, Guoxin
    Tang, Haihong
    Miao, Chunyan
    Li, Houqiang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2567 - 2576
  • [24] Dense Contrastive Learning for Self-Supervised Visual Pre-Training
    Wang, Xinlong
    Zhang, Rufeng
    Shen, Chunhua
    Kong, Tao
    Li, Lei
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3023 - 3032
  • [25] Contrastive Learning With Enhancing Detailed Information for Pre-Training Vision Transformer
    Liang, Zhuomin
    Bai, Liang
    Fan, Jinyu
    Yang, Xian
    Liang, Jiye
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 219 - 231
  • [26] Learning Depth Representation From RGB-D Videos by Time-Aware Contrastive Pre-Training
    He, Zongtao
    Wang, Liuyi
    Dang, Ronghao
    Li, Shu
    Yan, Qingqing
    Liu, Chengju
    Chen, Qijun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 4143 - 4158
  • [27] BRep-BERT: Pre-training Boundary Representation BERT with Sub-graph Node Contrastive Learning
    Lou, Yunzhong
    Li, Xueyang
    Chen, Haotian
    Zhou, Xiangdong
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 1657 - 1666
  • [28] Spatial-Temporal Cross-View Contrastive Pre-Training for Check-in Sequence Representation Learning
    Gong, Letian
    Wan, Huaiyu
    Guo, Shengnan
    Li, Xiucheng
    Lin, Yan
    Zheng, Erwen
    Wang, Tianyi
    Zhou, Zeyu
    Lin, Youfang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (12) : 9308 - 9321
  • [29] M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training
    Ni, Minheng
    Huang, Haoyang
    Su, Lin
    Cui, Edward
    Bharti, Taroon
    Wang, Lijuan
    Zhang, Dongdong
    Duan, Nan
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3976 - 3985
  • [30] Adversarial momentum-contrastive pre-training
    Xu, Cong
    Li, Dan
    Yang, Min
    PATTERN RECOGNITION LETTERS, 2022, 160 : 172 - 179