Learning Substructure Invariance for Out-of-Distribution Molecular Representations

被引:0
|
作者
Yang, Nianzu [1 ]
Zeng, Kaipeng [1 ]
Wu, Qitian [1 ]
Jia, Xiaosong [1 ]
Yan, Junchi [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
[2] Shanghai AI Lab, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
HIV-1; INTEGRASE; IDENTIFICATION; DRUGS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Molecule representation learning (MRL) has been extensively studied and current methods have shown promising power for various tasks, e.g., molecular property prediction and target identification. However, a common hypothesis of existing methods is that either the model development or experimental evaluation is mostly based on i.i.d. data across training and testing. Such a hypothesis can be violated in real-world applications where testing molecules could come from new environments, bringing about serious performance degradation or unexpected prediction. We propose a new representation learning framework entitled MoleOOD to enhance the robustness of MRL models against such distribution shifts, motivated by an observation that the (bio)chemical properties of molecules are usually invariantly associated with certain privileged molecular substructures across different environments (e.g., scaffolds, sizes, etc.). Specifically, We introduce an environment inference model to identify the latent factors that impact data generation from different distributions in a fully data-driven manner. We also propose a new learning objective to guide the molecule encoder to leverage environment-invariant substructures that more stably relate with the labels across environments. Extensive experiments on ten real-world datasets demonstrate that our model has a stronger generalization ability than existing methods under various out-of-distribution (OOD) settings, despite the absence of manual specifications of environments. Particularly, our method achieves up to 5.9% and 3.9% improvement over the strongest baselines on OGB and DrugOOD benchmarks in terms of ROC-AUC, respectively. Our source code is publicly available at https://github.com/yangnianzu0515/MoleOOD.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Learning Invariant Graph Representations for Out-of-Distribution Generalization
    Li, Haoyang
    Zhang, Ziwei
    Wang, Xin
    Zhu, Wenwu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Learning Causally Invariant Representations for Out-of-Distribution Generalization on Graphs
    Chen, Yongqiang
    Zhang, Yonggang
    Bian, Yatao
    Yang, Han
    Ma, Kaili
    Xie, Binghui
    Liu, Tongliang
    Han, Bo
    Cheng, James
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [3] Out-of-distribution Detection Learning with Unreliable Out-of-distribution Sources
    Zheng, Haotian
    Wang, Qizhou
    Fang, Zhen
    Xia, Xiaobo
    Liu, Feng
    Liu, Tongliang
    Han, Bo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization
    Qi, Jiaxin
    Tang, Kaihua
    Sun, Qianru
    Hua, Xian-Sheng
    Zhang, Hanwang
    COMPUTER VISION, ECCV 2022, PT XXV, 2022, 13685 : 92 - 109
  • [5] Learning Numerosity Representations with Transformers: Number Generation Tasks and Out-of-Distribution Generalization
    Boccato, Tommaso
    Testolin, Alberto
    Zorzi, Marco
    ENTROPY, 2021, 23 (07)
  • [6] Learning on Graphs with Out-of-Distribution Nodes
    Song, Yu
    Wang, Donglin
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 1635 - 1645
  • [7] SIREN: Shaping Representations for Detecting Out-of-Distribution Objects
    Du, Xuefeng
    Gozum, Gabriel
    Ming, Yifei
    Li, Yixuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] DMR: Disentangling Marginal Representations for Out-of-Distribution Detection
    Choi, Dasol
    Na, Dongbin
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2024, : 4032 - 4041
  • [9] Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization
    Ahuja, Kartik
    Caballero, Ethan
    Zhang, Dinghuai
    Gagnon-Audet, Jean-Christophe
    Bengio, Yoshua
    Mitliagkas, Ioannis
    Rish, Irina
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [10] Out-of-Distribution Detection using Multiple Semantic Label Representations
    Shalev, Gabi
    Adi, Yossi
    Keshet, Joseph
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31