Multi-modal molecule structure–text model for text-based retrieval and editing

被引:0
|
作者
Shengchao Liu
Weili Nie
Chengpeng Wang
Jiarui Lu
Zhuoran Qiao
Ling Liu
Jian Tang
Chaowei Xiao
Animashree Anandkumar
机构
[1] Mila-Québec Artificial Intelligence Institute,
[2] Université de Montréal,undefined
[3] NVIDIA Research,undefined
[4] University of Illinois Urbana-Champaign,undefined
[5] California Institute of Technology,undefined
[6] Princeton University,undefined
[7] HEC Montréal,undefined
[8] Arizona State University,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure–text model, MoleculeSTM, by jointly learning molecules’ chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure–text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure–text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.
引用
收藏
页码:1447 / 1457
页数:10
相关论文
共 50 条
  • [1] Multi-modal molecule structure-text model for text-based retrieval and editing
    Liu, Shengchao
    Nie, Weili
    Wang, Chengpeng
    Lu, Jiarui
    Qiao, Zhuoran
    Liu, Ling
    Tang, Jian
    Xiao, Chaowei
    Anandkumar, Animashree
    [J]. NATURE MACHINE INTELLIGENCE, 2023, 5 (12) : 1447 - 1457
  • [2] Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
    Zhao, Wangbo
    Wang, Kai
    Chu, Xiangxiang
    Xue, Fuzhao
    Wang, Xinchao
    You, Yang
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11727 - 11736
  • [3] Multi-modal Broad Learning System for Medical Image and Text-based Classification
    Zhou, Yanhong
    Du, Jie
    Guan, Kai
    Wang, Tianfu
    [J]. 2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 3439 - 3442
  • [4] Modal Complementarity Based on Multimodal Large Language Model for Text-Based Person Retrieval
    Bao, Tong
    Xu, Tong
    Xu, Derong
    Zheng, Zhi
    [J]. WEB AND BIG DATA, APWEB-WAIM 2024, PT I, 2024, 14961 : 264 - 279
  • [5] SIAMCLIM: TEXT-BASED PEDESTRIAN SEARCH VIA MULTI-MODAL SIAMESE CONTRASTIVE LEARNING
    Huang, Runlin
    Wu, Shuyang
    Jie, Leiping
    Zuo, Xinxin
    Zhang, Hui
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1800 - 1804
  • [6] PaSeMix: A Multi-modal Partitional Semantic Data Augmentation Method for Text-Based Person Search
    Yuan, Xinpan
    Li, Jiabao
    Gan, Wenguang
    Xia, Wei
    Weng, Yanbin
    [J]. ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14864 : 468 - 479
  • [7] Text-Video Retrieval via Multi-Modal Hypergraph Networks
    Li, Qian
    Su, Lixin
    Zhao, Jiashu
    Xia, Long
    Cai, Hengyi
    Cheng, Suqi
    Tang, Hengzhu
    Wang, Junfeng
    Yin, Dawei
    [J]. PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 369 - 377
  • [8] Deep Neural Architecture for Multi-Modal Retrieval based on Joint Embedding Space for Text and Images
    Balaneshin-kordan, Saeid
    Kotov, Alexander
    [J]. WSDM'18: PROCEEDINGS OF THE ELEVENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2018, : 28 - 36
  • [9] Automatic generation of multi-modal dialogue from text based on discourse structure analysis
    Prendinger, Helmut
    Piwek, Paul
    Ishizuka, Mitsuru
    [J]. ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 27 - +
  • [10] MULTI-MODAL LEARNING WITH TEXT MERGING FOR TEXTVQA
    Xu, Changsheng
    Xu, Zhenlong
    He, Yifan
    Zhou, Shuigeng
    Guan, Jihong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1985 - 1989