An effective self-supervised framework for learning expressive molecular global representations to drug discovery

被引:69
|
作者
Li, Pengyong [1 ]
Wang, Jun [2 ]
Qiao, Yixuan [3 ]
Chen, Hao [3 ]
Yu, Yihuan [4 ]
Yao, Xiaojun [5 ]
Gao, Peng [2 ]
Xie, Guotong [2 ]
Song, Sen [6 ]
机构
[1] Tsinghua Univ, Dept Biomed Engn, Beijing, Peoples R China
[2] PingAn Healthcare Technol, Beijing, Peoples R China
[3] Beijing Univ Technol, Operat Res & Cybernet, Beijing, Peoples R China
[4] Beijing Univ Biomed Engn, Beijing, Peoples R China
[5] Lanzhou Univ, Analyt Chem & Chemoinformat, Lanzhou, Peoples R China
[6] Tsinghua Univ, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
molecular representation; deep learning; graph neural network; self-supervised learning; PREDICTION; DESCRIPTORS;
D O I
10.1093/bib/bbab109
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
How to produce expressive molecular representations is a fundamental challenge in artificial intelligence-driven drug discovery. Graph neural network (GNN) has emerged as a powerful technique for modeling molecular data. However, previous supervised approaches usually suffer from the scarcity of labeled data and poor generalization capability. Here, we propose a novel molecular pre-training graph-based deep learning framework, named MPG, that learns molecular representations from large-scale unlabeled molecules. In MPG, we proposed a powerful GNN for modelling molecular graph named MolGNet, and designed an effective self-supervised strategy for pre-training the model at both the node and graph-level. After pre-training on 11 million unlabeled molecules, we revealed that MolGNet can capture valuable chemical insights to produce interpretable representation. The pre-trained MolGNet can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of drug discovery tasks, including molecular properties prediction, drug-drug interaction and drug-target interaction, on 14 benchmark datasets. The pre-trained MolGNet in MPG has the potential to become an advanced molecular encoder in the drug discovery pipeline.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Learning self-supervised molecular representations for drug–drug interaction prediction
    Rogia Kpanou
    Patrick Dallaire
    Elsa Rousseau
    Jacques Corbeil
    BMC Bioinformatics, 25
  • [2] Learning self-supervised molecular representations for drug-drug interaction prediction
    Kpanou, Rogia
    Dallaire, Patrick
    Rousseau, Elsa
    Corbeil, Jacques
    BMC BIOINFORMATICS, 2024, 25 (01)
  • [3] Towards Efficient and Effective Self-supervised Learning of Visual Representations
    Addepalli, Sravanti
    Bhogale, Kaushal
    Dey, Priyam
    Babu, R. Venkatesh
    COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 523 - 538
  • [4] Self-supervised learning with ensemble representations
    Han, Kyoungmin
    Lee, Minsik
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 143
  • [5] Self-Supervised Learning of Smart Contract Representations
    Yang, Shouliang
    Gu, Xiaodong
    Shen, Beijun
    30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 82 - 93
  • [6] A Simple and Effective Self-Supervised Contrastive Learning Framework for Aspect Detection
    Shi, Tian
    Li, Liuqing
    Wang, Ping
    Reddy, Chandan K.
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13815 - 13824
  • [7] MIFS: An adaptive multipath information fused self-supervised framework for drug discovery
    Gong, Xu
    Liu, Qun
    Han, Rui
    Guo, Yike
    Wang, Guoyin
    NEURAL NETWORKS, 2025, 184
  • [8] GLOCAL: A self-supervised learning framework for global and local motion estimation
    Zheng, Yihao
    Luo, Kunming
    Liu, Shuaicheng
    Li, Zun
    Xiang, Ye
    Wu, Lifang
    Zeng, Bing
    Chen, Chang Wen
    PATTERN RECOGNITION LETTERS, 2024, 178 : 91 - 97
  • [9] Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy
    Zhang, Tong
    Qiu, Congpei
    Ke, Wei
    Suesstrunk, Sabine
    Salzmann, Mathieu
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16559 - 16568
  • [10] Lead-agnostic Self-supervised Learning for Local and Global Representations of Electrocardiogram
    Oh, Jungwoo
    Chung, Hyunseung
    Kwon, Joon-myoung
    Hong, Dong-gyun
    Choi, Edward
    CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 174, 2022, 174 : 338 - 353