Learning Invariant Molecular Representation in Latent Discrete Space

被引：0

作者：

Zhuang, Xiang ^{[1
,2
,3
]}

Zhang, Qiang ^{[1
,2
,3
]}

Ding, Keyan ^{[2
]}

Bian, Yatao ^{[4
]}

Wang, Xiao ^{[5
]}

Lv, Jingsong ^{[6
]}

Chen, Hongyang ^{[6
]}

Chen, Huajun ^{[1
,2
,3
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China

[2] ZJU Hangzhou Global Sci & Technol Innovat Ctr, Hangzhou, Peoples R China

[3] Zhejiang Univ Ant Grp Joint Lab Knowledge Graph, Hangzhou, Peoples R China

[4] Tencent AI Lab, Shenzhen, Peoples R China

[5] Beihang Univ, Sch Software, Beijing, Peoples R China

[6] Zhejiang Lab, Hangzhou, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

中国国家自然科学基金;

关键词：

DESIGN;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Molecular representation learning lays the foundation for drug discovery. However, existing methods suffer from poor out-of-distribution (OOD) generalization, particularly when data for training and testing originate from different environments. To address this issue, we propose a new framework for learning molecular representations that exhibit invariance and robustness against distribution shifts. Specifically, we propose a strategy called "first-encoding-then-separation" to identify invariant molecule features in the latent space, which deviates from conventional practices. Prior to the separation step, we introduce a residual vector quantization module that mitigates the over-fitting to training data distributions while preserving the expressivity of encoders. Furthermore, we design a task-agnostic self-supervised learning objective to encourage precise invariance identification, which enables our method widely applicable to a variety of tasks, such as regression and multi-label classification. Extensive experiments on 18 real-world molecular datasets demonstrate that our model achieves stronger generalization against state-of-the-art baselines in the presence of various distribution shifts. Our code is available at https://github.com/HICAI-ZJU/iMoLD.

引用

页数：18

共 50 条

[1] Dual Space Latent Representation Learning for Image Representation
Huang, Yulei
Ma, Ziping
Li, Huirong
Wang, Jingyu
MATHEMATICS, 2023, 11 (11)
[2] DeepMDP: Learning Continuous Latent Space Models for Representation Learning
Gelada, Carles
Kumar, Saurabh
Buckman, Jacob
Nachum, Ofir
Bellemare, Marc G.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[3] A COMPARISON OF DISCRETE LATENT VARIABLE MODELS FOR SPEECH REPRESENTATION LEARNING
Zhou, Henry
Baevski, Alexei
Auli, Michael
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3050 - 3054
[4] Dual space latent representation learning for unsupervised feature selection
Shang, Ronghua
Wang, Lujuan
Shang, Fanhua
Jiao, Licheng
Li, Yangyang
PATTERN RECOGNITION, 2021, 114
[5] Multiview Clustering via Proximity Learning in Latent Representation Space
Liu, Bao-Yu
Huang, Ling
Wang, Chang-Dong
Lai, Jian-Huang
Yu, Philip S.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 973 - 986
[6] Invariant Node Representation Learning under Distribution Shifts with Multiple Latent Environments
Li, Haoyang
Zhang, Ziwei
Wang, Xin
Zhu, Wenwu
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (01)
[7] A deep latent space model for interpretable representation learning on directed graphs
Yang, Hanxuan
Kong, Qingchao
Mao, Wenji
NEUROCOMPUTING, 2024, 576
[8] Representation learning for social networks using Homophily based Latent Space Model
Nerurkar, Pranav
Chandane, Madhav
Bhirud, Sunil
INTERNATIONAL CONFERENCE ON OMNI-LAYER INTELLIGENT SYSTEMS (COINS), 2019, : 38 - 43
[9] Representation of invariant subspaces of the Schwartz space
Abuzyarova, Natal'ya F.
SBORNIK MATHEMATICS, 2022, 213 (08) : 1020 - 1040
[10] Invariant representation and matching of space curves
Lo, Chong-Huah, 2000, Kluwer Academic Publishers, Dordrecht (28):

← 1 2 3 4 5 →