An effective self-supervised framework for learning expressive molecular global representations to drug discovery

被引:69
|
作者
Li, Pengyong [1 ]
Wang, Jun [2 ]
Qiao, Yixuan [3 ]
Chen, Hao [3 ]
Yu, Yihuan [4 ]
Yao, Xiaojun [5 ]
Gao, Peng [2 ]
Xie, Guotong [2 ]
Song, Sen [6 ]
机构
[1] Tsinghua Univ, Dept Biomed Engn, Beijing, Peoples R China
[2] PingAn Healthcare Technol, Beijing, Peoples R China
[3] Beijing Univ Technol, Operat Res & Cybernet, Beijing, Peoples R China
[4] Beijing Univ Biomed Engn, Beijing, Peoples R China
[5] Lanzhou Univ, Analyt Chem & Chemoinformat, Lanzhou, Peoples R China
[6] Tsinghua Univ, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
molecular representation; deep learning; graph neural network; self-supervised learning; PREDICTION; DESCRIPTORS;
D O I
10.1093/bib/bbab109
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
How to produce expressive molecular representations is a fundamental challenge in artificial intelligence-driven drug discovery. Graph neural network (GNN) has emerged as a powerful technique for modeling molecular data. However, previous supervised approaches usually suffer from the scarcity of labeled data and poor generalization capability. Here, we propose a novel molecular pre-training graph-based deep learning framework, named MPG, that learns molecular representations from large-scale unlabeled molecules. In MPG, we proposed a powerful GNN for modelling molecular graph named MolGNet, and designed an effective self-supervised strategy for pre-training the model at both the node and graph-level. After pre-training on 11 million unlabeled molecules, we revealed that MolGNet can capture valuable chemical insights to produce interpretable representation. The pre-trained MolGNet can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 14 benchmark datasets. The pre-trained MolGNet in MPG has the potential to become an advanced molecular encoder in the drug discovery pipeline.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Self-Supervised Visual Representations Learning by Contrastive Mask Prediction
    Zhao, Yucheng
    Wang, Guangting
    Luo, Chong
    Zeng, Wenjun
    Zha, Zheng-Jun
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10140 - 10149
  • [32] InferCode: Self-Supervised Learning of Code Representations by Predicting Subtrees
    Bui, Nghi D. Q.
    Yu, Yijun
    Jiang, Lingxiao
    2021 IEEE/ACM 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2021), 2021, : 1186 - 1197
  • [33] Decoupling Common and Unique Representations for Multimodal Self-supervised Learning
    Wang, Yi
    Albrecht, Conrad M.
    Braham, Nassim Ait Ali
    Liu, Chenying
    Xiong, Zhitong
    Zhu, Xiao Xiang
    COMPUTER VISION - ECCV 2024, PT XXIX, 2025, 15087 : 286 - 303
  • [34] Align Representations with Base: A New Approach to Self-Supervised Learning
    Zhang, Shaofeng
    Qiu, Lyn
    Zhu, Feng
    Yan, Junchi
    Zhang, Hengrui
    Zhao, Rui
    Li, Hongyang
    Yang, Xiaokang
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16579 - 16588
  • [35] Continually Learning Self-Supervised Representations with Projected Functional Regularization
    Gomez-Villa, Alex
    Twardowski, Bartlomiej
    Yu, Lu
    Bagdanov, Andrew D.
    van de Weijer, Joost
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3866 - 3876
  • [36] Self-Supervised Learning of Face Representations for Video Face Clustering
    Sharma, Vivek
    Tapaswi, Makarand
    Sarfraz, M. Saquib
    Stiefelhagen, Rainer
    2019 14TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2019), 2019, : 360 - 367
  • [37] Self-Supervised Representations for Multi-View Reinforcement Learning
    Yang, Huanhuan
    Shi, Dianxi
    Xie, Guojun
    Peng, Yingxuan
    Zhang, Yi
    Yang, Yantai
    Yang, Shaowu
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 2203 - 2213
  • [38] Self-supervised learning for effective denoising of flow fields
    Yu, Linqi
    Yousif, Mustafa Z.
    Zhou, Dan
    Zhang, Meng
    Lee, Jung Sub
    Lim, Hee-Chang
    PHYSICS OF FLUIDS, 2024, 36 (10)
  • [39] Effective Targeted Attacks for Adversarial Self-Supervised Learning
    Kim, Minseon
    Ha, Hyeonjeong
    Son, Sooel
    Hwang, Sung Ju
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [40] Global Channel Pruning With Self-Supervised Mask Learning
    Ma, Ming
    Zhang, Tongzhou
    Wang, Ziming
    Wang, Yue
    Du, Taoli
    Li, Wenhui
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2013 - 2025