Multi-modal molecule structure-text model for text-based retrieval and editing

被引：6

作者：

Liu, Shengchao ^{[1
,2
]}

Nie, Weili ^{[3
]}

Wang, Chengpeng ^{[4
]}

Lu, Jiarui ^{[1
,2
]}

Qiao, Zhuoran ^{[5
]}

Liu, Ling ^{[6
]}

Tang, Jian ^{[1
,7
]}

Xiao, Chaowei ^{[3
,8
]}

Anandkumar, Animashree ^{[3
,5
]}

机构：

[1] Mila Quebec Artificial Intelligence Inst, Montreal, PQ, Canada

[2] Univ Montreal, Montreal, PQ, Canada

[3] NVIDIA Res, Santa Clara, CA 95051, Albania

[4] Univ Illinois, Champaign, IL USA

[5] CALTECH, Pasadena, CA 91125 USA

[6] Princeton Univ, Princeton, NJ USA

[7] HEC Montreal, Montreal, PQ, Canada

[8] Arizona State Univ, Tempe, AZ USA

来源：

NATURE MACHINE INTELLIGENCE | 2023年 / 5卷 / 12期

关键词：

DRUG; SIMILARITY; DISCOVERY; CHEMISTRY; AREA; ZINC;

D O I：

10.1038/s42256-023-00759-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure-text model, MoleculeSTM, by jointly learning molecules' chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure-text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure-text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks. Machine learning methods in cheminformatics have made great progress in using chemical structures of molecules, but a large portion of textual information remains scarcely explored. Liu and colleagues trained MoleculeSTM, a foundation model that aligns the structure and text modalities through contrastive learning, and show its utility on the downstream tasks of structure-text retrieval, text-guided editing and molecular property prediction.

引用

下载

页码：1447 / 1457

页数：11

共 50 条

[1] Multi-modal molecule structure–text model for text-based retrieval and editing
Shengchao Liu
Weili Nie
Chengpeng Wang
Jiarui Lu
Zhuoran Qiao
Ling Liu
Jian Tang
Chaowei Xiao
Animashree Anandkumar
Nature Machine Intelligence, 2023, 5 : 1447 - 1457
[2] Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Zhao, Wangbo
Wang, Kai
Chu, Xiangxiang
Xue, Fuzhao
Wang, Xinchao
You, Yang
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11727 - 11736
[3] Multi-modal Broad Learning System for Medical Image and Text-based Classification
Zhou, Yanhong
Du, Jie
Guan, Kai
Wang, Tianfu
2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 3439 - 3442
[4] Modal Complementarity Based on Multimodal Large Language Model for Text-Based Person Retrieval
Bao, Tong
Xu, Tong
Xu, Derong
Zheng, Zhi
WEB AND BIG DATA, APWEB-WAIM 2024, PT I, 2024, 14961 : 264 - 279
[5] SIAMCLIM: TEXT-BASED PEDESTRIAN SEARCH VIA MULTI-MODAL SIAMESE CONTRASTIVE LEARNING
Huang, Runlin
Wu, Shuyang
Jie, Leiping
Zuo, Xinxin
Zhang, Hui
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1800 - 1804
[6] PaSeMix: A Multi-modal Partitional Semantic Data Augmentation Method for Text-Based Person Search
Yuan, Xinpan
Li, Jiabao
Gan, Wenguang
Xia, Wei
Weng, Yanbin
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14864 : 468 - 479
[7] Text-Video Retrieval via Multi-Modal Hypergraph Networks
Li, Qian
Su, Lixin
Zhao, Jiashu
Xia, Long
Cai, Hengyi
Cheng, Suqi
Tang, Hengzhu
Wang, Junfeng
Yin, Dawei
PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 369 - 377
[8] Deep Neural Architecture for Multi-Modal Retrieval based on Joint Embedding Space for Text and Images
Balaneshin-kordan, Saeid
Kotov, Alexander
WSDM'18: PROCEEDINGS OF THE ELEVENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2018, : 28 - 36
[9] Automatic generation of multi-modal dialogue from text based on discourse structure analysis
Prendinger, Helmut
Piwek, Paul
Ishizuka, Mitsuru
ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 27 - +
[10] MULTI-MODAL LEARNING WITH TEXT MERGING FOR TEXTVQA
Xu, Changsheng
Xu, Zhenlong
He, Yifan
Zhou, Shuigeng
Guan, Jihong
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1985 - 1989

← 1 2 3 4 5 →