Multi-modal molecule structure–text model for text-based retrieval and editing

被引:0
|
作者
Shengchao Liu
Weili Nie
Chengpeng Wang
Jiarui Lu
Zhuoran Qiao
Ling Liu
Jian Tang
Chaowei Xiao
Animashree Anandkumar
机构
[1] Mila-Québec Artificial Intelligence Institute,
[2] Université de Montréal,undefined
[3] NVIDIA Research,undefined
[4] University of Illinois Urbana-Champaign,undefined
[5] California Institute of Technology,undefined
[6] Princeton University,undefined
[7] HEC Montréal,undefined
[8] Arizona State University,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure–text model, MoleculeSTM, by jointly learning molecules’ chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure–text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure–text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks.
引用
收藏
页码:1447 / 1457
页数:10
相关论文
共 50 条
  • [21] TrojanEdit: Backdooring Text-Based Image Editing Models
    Guo, Ji
    Chen, Peihong
    Jiang, Wenbo
    Lu, Guoming
    arXiv,
  • [22] Text-based Editing of Talking-head Video
    Fried, Ohad
    Tewari, Ayush
    Zollhofer, Michael
    Finkelstein, Adam
    Shechtman, Eli
    Goldman, Dan B.
    Genova, Kyle
    Jin, Zeyu
    Theobalt, Christian
    Agrawala, Maneesh
    ACM TRANSACTIONS ON GRAPHICS, 2019, 38 (04):
  • [23] Multi-modal depression detection based on emotional audio and evaluation text
    Ye, Jiayu
    Yu, Yanhong
    Wang, Qingxiang
    Li, Wentao
    Liang, Hu
    Zheng, Yunshao
    Fu, Gang
    JOURNAL OF AFFECTIVE DISORDERS, 2021, 295 : 904 - 913
  • [24] A Multi-Modal Medical Image Analysis Algorithm Based on Text Guidance
    Fan, Lin
    Gong, Xun
    Zheng, Cen-Yang
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2024, 52 (07): : 2341 - 2355
  • [25] Text-Based Image Retrieval using Progressive Multi-Instance Learning
    Li, Wen
    Duan, Lixin
    Xu, Dong
    Tsang, Ivor Wai-Hung
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 2049 - 2055
  • [26] Semantic text-based image retrieval with multi-modality ontology and DBpedia
    Aspura, Yanti Idaya M. K.
    Noah, Shahrul Azman Mohd
    ELECTRONIC LIBRARY, 2017, 35 (06): : 1191 - 1214
  • [27] Multi-text multi-modal reading processes and comprehension
    Cromley, Jennifer G.
    Kunze, Andrea J.
    Dane, Aygul Parpucu
    LEARNING AND INSTRUCTION, 2021, 71
  • [28] Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval
    Hua, Yan
    Yang, Yingyun
    Du, Jianhe
    ELECTRONICS, 2020, 9 (03)
  • [29] Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval
    Mafla, Andres
    Dey, Sounak
    Biten, Ali Furkan
    Gomez, Lluis
    Karatzas, Dimosthenis
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 4022 - 4032
  • [30] MIGT: Multi-modal image inpainting guided with text
    Li, Ailin
    Zhao, Lei
    Zuo, Zhiwen
    Wang, Zhizhong
    Xing, Wei
    Lu, Dongming
    NEUROCOMPUTING, 2023, 520 : 376 - 385