Bidirectional Learning for Offline Infinite-width Model-based Optimization

被引:0
|
作者
Chen, Can [1 ]
Zhang, Yingxue [2 ]
Fu, Jie [3 ]
Liu, Xue [1 ]
Coates, Mark [1 ]
机构
[1] McGill Univ, Montreal, PQ, Canada
[2] Huawei Noahs Ark Lab, Montreal, PQ, Canada
[3] Beijing Acad Artificial Intelligence, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In offline model-based optimization, we strive to maximize a black-box objective function by only leveraging a static dataset of designs and their scores. This problem setting arises in numerous fields including the design of materials, robots, DNA sequences, and proteins. Recent approaches train a deep neural network (DNN) on the static dataset to act as a proxy function, and then perform gradient ascent on the existing designs to obtain potentially high-scoring designs. This methodology frequently suffers from the out-of-distribution problem where the proxy function often returns poor designs. To mitigate this problem, we propose BiDirectional learning for offline Infinite-width model-based optimization (BDI). BDI consists of two mappings: the forward mapping leverages the static dataset to predict the scores of the high-scoring designs, and the backward mapping leverages the highscoring designs to predict the scores of the static dataset. The backward mapping, neglected in previous work, can distill more information from the static dataset into the high-scoring designs, which effectively mitigates the out-of-distribution problem. For a finite-width DNN model, the loss function of the backward mapping is intractable and only has an approximate form, which leads to a significant deterioration of the design quality. We thus adopt an infinite-width DNN model, and propose to employ the corresponding neural tangent kernel to yield a closed-form loss for more accurate design updates. Experiments on various tasks verify the effectiveness of BDI. The code is available here.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Bidirectional Learning for Offline Model-based Biological Sequence Design
    Chen, Can
    Zhang, Yingxue
    Liu, Xue
    Coates, Mark
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [2] BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning
    Yang, Yijun
    Jiang, Jing
    Wang, Zhuowei
    Duan, Qiqi
    Shi, Yuhui
    [J]. AI 2021: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13151 : 570 - 581
  • [3] Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks
    Yang, Greg
    Hu, Edward J.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [4] Bidirectional Model-based Policy Optimization
    Lai, Hang
    Shen, Jian
    Zhang, Weinan
    Yu, Yong
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [5] MOPO: Model-based Offline Policy Optimization
    Yu, Tianhe
    Thomas, Garrett
    Yu, Lantao
    Ermon, Stefano
    Zou, James
    Levine, Sergey
    Finn, Chelsea
    Ma, Tengyu
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [6] MOReL: Model-Based Offline Reinforcement Learning
    Kidambi, Rahul
    Rajeswaran, Aravind
    Netrapalli, Praneeth
    Joachims, Thorsten
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [7] Offline Model-based Adaptable Policy Learning
    Chen, Xiong-Hui
    Yu, Yang
    Li, Qingyang
    Luo, Fan-Ming
    Qin, Zhiwei
    Shang, Wenjie
    Ye, Jieping
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [8] RoMA: Robust Model Adaptation for Offline Model-based Optimization
    Yu, Sihyun
    Ahn, Sungsoo
    Song, Le
    Shin, Jinwoo
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [9] Parallel-mentoring for Offline Model-based Optimization
    Chen, Can
    Beckham, Christopher
    Liu, Zixuan
    Liu, Xue
    Pal, Christopher
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] COMBO: Conservative Offline Model-Based Policy Optimization
    Yu, Tianhe
    Kumar, Aviral
    Rafailov, Rafael
    Rajeswaran, Aravind
    Levine, Sergey
    Finn, Chelsea
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34