Parallel-mentoring for Offline Model-based Optimization

被引:0
|
作者
Chen, Can [1 ,2 ]
Beckham, Christopher [2 ,3 ]
Liu, Zixuan [4 ]
Liu, Xue [1 ,2 ]
Pal, Christopher [2 ,3 ]
机构
[1] McGill Univ, Montreal, PQ, Canada
[2] MILA Quebec AI Inst, Montreal, PQ, Canada
[3] Polytech Montreal, Montreal, PQ, Canada
[4] Univ Washington, Seattle, WA 98195 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study offline model-based optimization to maximize a black-box objective function with a static dataset of designs and scores. These designs encompass a variety of domains, including materials, robots and DNA sequences. A common approach trains a proxy on the static dataset to approximate the black-box objective function and performs gradient ascent to obtain new designs. However, this often results in poor designs due to the proxy inaccuracies for out-of-distribution designs. Recent studies indicate that: (a) gradient ascent with a mean ensemble of proxies generally outperforms simple gradient ascent, and (b) a trained proxy provides weak ranking supervision signals for design selection. Motivated by (a) and (b), we propose parallel-mentoring as an effective and novel method that facilitates mentoring among parallel proxies, creating a more robust ensemble to mitigate the out-of-distribution issue. We focus on the three-proxy case and our method consists of two modules. The first module, voting-based pairwise supervision, operates on three parallel proxies and captures their ranking supervision signals as pairwise comparison labels. These labels are combined through majority voting to generate consensus labels, which incorporate ranking supervision signals from all proxies and enable mutual mentoring. However, label noise arises due to possible incorrect consensus. To alleviate this, we introduce an adaptive soft-labeling module with soft-labels initialized as consensus labels. Based on bi-level optimization, this module fine-tunes proxies in the inner level and learns more accurate labels in the outer level to adaptively mentor proxies, resulting in a more robust ensemble. Experiments validate the effectiveness of our method. Our code is available here.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] MOPO: Model-based Offline Policy Optimization
    Yu, Tianhe
    Thomas, Garrett
    Yu, Lantao
    Ermon, Stefano
    Zou, James
    Levine, Sergey
    Finn, Chelsea
    Ma, Tengyu
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [2] RoMA: Robust Model Adaptation for Offline Model-based Optimization
    Yu, Sihyun
    Ahn, Sungsoo
    Song, Le
    Shin, Jinwoo
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] COMBO: Conservative Offline Model-Based Policy Optimization
    Yu, Tianhe
    Kumar, Aviral
    Rafailov, Rafael
    Rajeswaran, Aravind
    Levine, Sergey
    Finn, Chelsea
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] Model-Based Offline Adaptive Policy Optimization with Episodic Memory
    Cao, Hongye
    Wei, Qianru
    Zheng, Jiangbin
    Shi, Yanqing
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT II, 2022, 13530 : 50 - 62
  • [5] Conservative Objective Models for Effective Offline Model-Based Optimization
    Trabucco, Brandon
    Kumar, Aviral
    Geng, Xinyang
    Levine, Sergey
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7368 - 7378
  • [6] ROMO: Retrieval-enhanced Offline Model-based Optimization
    Chen, Mingcheng
    Zhao, Haoran
    Zhao, Yuxiang
    Fan, Hulei
    Gao, Hongqiao
    Yu, Yong
    Tian, Zheng
    [J]. 2023 5TH INTERNATIONAL CONFERENCE ON DISTRIBUTED ARTIFICIAL INTELLIGENCE, DAI 2023, 2023,
  • [7] Model-Based Offline Policy Optimization with Distribution Correcting Regularization
    Shen, Jian
    Chen, Mingcheng
    Zhang, Zhicheng
    Yang, Zhengyu
    Zhang, Weinan
    Yu, Yong
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, 2021, 12975 : 174 - 189
  • [8] Bidirectional Learning for Offline Infinite-width Model-based Optimization
    Chen, Can
    Zhang, Yingxue
    Fu, Jie
    Liu, Xue
    Coates, Mark
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning
    Yang, Yijun
    Jiang, Jing
    Wang, Zhuowei
    Duan, Qiqi
    Shi, Yuhui
    [J]. AI 2021: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13151 : 570 - 581
  • [10] Importance-aware Co-teaching for Offline Model-based Optimization
    Yuan, Ye
    Chen, Can
    Liu, Zixuan
    Neiswanger, Willie
    Liu, Xue
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,