Learning Substructure Invariance for Out-of-Distribution Molecular Representations

被引:0
|
作者
Yang, Nianzu [1 ]
Zeng, Kaipeng [1 ]
Wu, Qitian [1 ]
Jia, Xiaosong [1 ]
Yan, Junchi [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
[2] Shanghai AI Lab, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
HIV-1; INTEGRASE; IDENTIFICATION; DRUGS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Molecule representation learning (MRL) has been extensively studied and current methods have shown promising power for various tasks, e.g., molecular property prediction and target identification. However, a common hypothesis of existing methods is that either the model development or experimental evaluation is mostly based on i.i.d. data across training and testing. Such a hypothesis can be violated in real-world applications where testing molecules could come from new environments, bringing about serious performance degradation or unexpected prediction. We propose a new representation learning framework entitled MoleOOD to enhance the robustness of MRL models against such distribution shifts, motivated by an observation that the (bio)chemical properties of molecules are usually invariantly associated with certain privileged molecular substructures across different environments (e.g., scaffolds, sizes, etc.). Specifically, We introduce an environment inference model to identify the latent factors that impact data generation from different distributions in a fully data-driven manner. We also propose a new learning objective to guide the molecule encoder to leverage environment-invariant substructures that more stably relate with the labels across environments. Extensive experiments on ten real-world datasets demonstrate that our model has a stronger generalization ability than existing methods under various out-of-distribution (OOD) settings, despite the absence of manual specifications of environments. Particularly, our method achieves up to 5.9% and 3.9% improvement over the strongest baselines on OGB and DrugOOD benchmarks in terms of ROC-AUC, respectively. Our source code is publicly available at https://github.com/yangnianzu0515/MoleOOD.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Generalizing Reward Modeling for Out-of-Distribution Preference Learning
    Jia, Chen
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, PT V, ECML PKDD 2024, 2024, 14945 : 107 - 124
  • [32] Machine and deep learning performance in out-of-distribution regressions
    Shmuel, Assaf
    Glickman, Oren
    Lazebnik, Teddy
    Machine Learning: Science and Technology, 2024, 5 (04):
  • [33] Continual Evidential Deep Learning for Out-of-Distribution Detection
    Aguilar, Eduardo
    Raducanu, Bogdan
    Radeva, Petia
    De Weijer, Joost Van
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 3436 - 3446
  • [34] Learning Causal Semantic Representation for Out-of-Distribution Prediction
    Liu, Chang
    Sun, Xinwei
    Wang, Jindong
    Tang, Haoyue
    Li, Tao
    Qin, Tao
    Chen, Wei
    Liu, Tie-Yan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [35] GIT: Detecting Uncertainty, Out-Of-Distribution and Adversarial Samples using Gradients and Invariance Transformations
    Lust, Julia
    Condurache, Alexandru P.
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [36] Real-World Molecular Out-Of-Distribution: Specification and Investigation
    Tossou, Prudencio
    Wognum, Cas
    Craig, Michael
    Mary, Hadrien
    Noutahi, Emmanuel
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (03) : 697 - 711
  • [37] Self-Supervised Learning for Generalizable Out-of-Distribution Detection
    Mohseni, Sina
    Pitale, Mandar
    Yadawa, J. B. S.
    Wang, Zhangyang
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5216 - 5223
  • [38] OUT-OF-DISTRIBUTION AS A TARGET CLASS IN SEMI-SUPERVISED LEARNING
    Tadros, Antoine
    Drouyer, Sebastien
    von Gioi, Rafael Grompone
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3249 - 3252
  • [39] Out-of-Distribution (OOD) Detection Based on Deep Learning: A Review
    Cui, Peng
    Wang, Jinjia
    ELECTRONICS, 2022, 11 (21)
  • [40] Improving Out-of-Distribution Detection by Learning from the Deployment Environment
    Inkawhich, Nathan
    Zhang, Jingyang
    Davis, Eric K.
    Luley, Ryan
    Chen, Yiran
    IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15 : 2070 - 2086