Multi-modal molecule structure-text model for text-based retrieval and editing

被引:6
|
作者
Liu, Shengchao [1 ,2 ]
Nie, Weili [3 ]
Wang, Chengpeng [4 ]
Lu, Jiarui [1 ,2 ]
Qiao, Zhuoran [5 ]
Liu, Ling [6 ]
Tang, Jian [1 ,7 ]
Xiao, Chaowei [3 ,8 ]
Anandkumar, Animashree [3 ,5 ]
机构
[1] Mila Quebec Artificial Intelligence Inst, Montreal, PQ, Canada
[2] Univ Montreal, Montreal, PQ, Canada
[3] NVIDIA Res, Santa Clara, CA 95051, Albania
[4] Univ Illinois, Champaign, IL USA
[5] CALTECH, Pasadena, CA 91125 USA
[6] Princeton Univ, Princeton, NJ USA
[7] HEC Montreal, Montreal, PQ, Canada
[8] Arizona State Univ, Tempe, AZ USA
关键词
DRUG; SIMILARITY; DISCOVERY; CHEMISTRY; AREA; ZINC;
D O I
10.1038/s42256-023-00759-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks. Machine learning methods in cheminformatics have made great progress in using chemical structures of molecules, but a large portion of textual information remains scarcely explored. Liu and colleagues trained MoleculeSTM, a foundation model that aligns the structure and text modalities through contrastive learning, and show its utility on the downstream tasks of structure-text retrieval, text-guided editing and molecular property prediction.
引用
下载
收藏
页码:1447 / 1457
页数:11
相关论文
共 50 条
  • [1] Multi-modal molecule structure–text model for text-based retrieval and editing
    Shengchao Liu
    Weili Nie
    Chengpeng Wang
    Jiarui Lu
    Zhuoran Qiao
    Ling Liu
    Jian Tang
    Chaowei Xiao
    Animashree Anandkumar
    Nature Machine Intelligence, 2023, 5 : 1447 - 1457
  • [2] Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
    Zhao, Wangbo
    Wang, Kai
    Chu, Xiangxiang
    Xue, Fuzhao
    Wang, Xinchao
    You, Yang
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11727 - 11736
  • [3] Multi-modal Broad Learning System for Medical Image and Text-based Classification
    Zhou, Yanhong
    Du, Jie
    Guan, Kai
    Wang, Tianfu
    2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 3439 - 3442
  • [4] Modal Complementarity Based on Multimodal Large Language Model for Text-Based Person Retrieval
    Bao, Tong
    Xu, Tong
    Xu, Derong
    Zheng, Zhi
    WEB AND BIG DATA, APWEB-WAIM 2024, PT I, 2024, 14961 : 264 - 279
  • [5] SIAMCLIM: TEXT-BASED PEDESTRIAN SEARCH VIA MULTI-MODAL SIAMESE CONTRASTIVE LEARNING
    Huang, Runlin
    Wu, Shuyang
    Jie, Leiping
    Zuo, Xinxin
    Zhang, Hui
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1800 - 1804
  • [6] PaSeMix: A Multi-modal Partitional Semantic Data Augmentation Method for Text-Based Person Search
    Yuan, Xinpan
    Li, Jiabao
    Gan, Wenguang
    Xia, Wei
    Weng, Yanbin
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14864 : 468 - 479
  • [7] Text-Video Retrieval via Multi-Modal Hypergraph Networks
    Li, Qian
    Su, Lixin
    Zhao, Jiashu
    Xia, Long
    Cai, Hengyi
    Cheng, Suqi
    Tang, Hengzhu
    Wang, Junfeng
    Yin, Dawei
    PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 369 - 377
  • [8] Deep Neural Architecture for Multi-Modal Retrieval based on Joint Embedding Space for Text and Images
    Balaneshin-kordan, Saeid
    Kotov, Alexander
    WSDM'18: PROCEEDINGS OF THE ELEVENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2018, : 28 - 36
  • [9] Automatic generation of multi-modal dialogue from text based on discourse structure analysis
    Prendinger, Helmut
    Piwek, Paul
    Ishizuka, Mitsuru
    ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 27 - +
  • [10] MULTI-MODAL LEARNING WITH TEXT MERGING FOR TEXTVQA
    Xu, Changsheng
    Xu, Zhenlong
    He, Yifan
    Zhou, Shuigeng
    Guan, Jihong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1985 - 1989