Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

被引:0
|
作者
Liu, Zhihan [1 ]
Lu, Miao [2 ]
Xiong, Wei [3 ]
Zhong, Han [4 ]
Hu, Hao [5 ]
Zhang, Shenao [1 ]
Zheng, Sirui [1 ]
Yang, Zhuoran [6 ]
Wang, Zhaoran [1 ]
机构
[1] Northwestern Univ, Evanston, IL 60208 USA
[2] Stanford Univ, Stanford, CA 94305 USA
[3] Univ Illinois, Urbana, IL USA
[4] Peking Univ, Beijing, Peoples R China
[5] Tsinghua Univ, Beijing, Peoples R China
[6] Yale Univ, New Haven, CT 06520 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In reinforcement learning (RL), balancing exploration and exploitation is crucial for achieving an optimal policy in a sample-efficient way. To this end, existing sample-efficient algorithms typically consist of three components: estimation, planning, and exploration. However, to cope with general function approximators, most of them involve impractical algorithmic components to incentivize exploration, such as data-dependent level-set constraints or complicated sampling procedures. To address this challenge, we propose an easy-to-implement RL framework called Maximize to Explore (MEX), which only needs to optimize unconstrainedly a single objective that integrates the estimation and planning components while balancing exploration and exploitation automatically. Theoretically, we prove that the MEX achieves a sublinear regret with general function approximators and is extendable to the zero-sum Markov game setting. Meanwhile, we adapt deep RL baselines to design practical versions of MEX in both the model-based and model-free settings, which outperform baselines in various MuJoCo environments with sparse reward by a stable margin. Compared with existing sample-efficient algorithms with general function approximators, MEX achieves similar sample efficiency while also enjoying a lower computational cost and is more compatible with modern deep RL methods. Our codes are available at https://github.com/agentification/MEX.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Estimation of production technology when the objective is to maximize return to the outlay
    Kumbhakar, Subal C.
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2011, 208 (02) : 170 - 176
  • [2] SUREXPL - AN OBJECTIVE FUNCTION SURFACE EXPLORATION ALGORITHM
    LINDSTROM, FT
    [J]. AMERICAN STATISTICIAN, 1980, 34 (03): : 183 - 183
  • [3] MODELS IN MANPOWER PLANNING - ONE OBJECTIVE - INCREASED UNDERSTANDING
    WALKER, JW
    [J]. BUSINESS HORIZONS, 1971, 14 (02) : 87 - 95
  • [4] Planning for Multi-Robot Exploration With Multiple Objective Utility Functions
    Butzke, Jonathan
    Likhachev, Maxim
    [J]. 2011 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2011, : 3254 - 3259
  • [5] Objective Function of ICA with Smooth Estimation of Kurtosis
    Matsuda, Yoshitatsu
    Yamaguchi, Kazunori
    [J]. NEURAL INFORMATION PROCESSING, PT III, 2015, 9491 : 164 - 171
  • [6] Doppler estimation based on constructing the guide function of the objective function
    Ning, Gengxin
    Ning, Qiuyan
    Wu, Lifei
    Zhang, Jun
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (ICSPCC), 2017,
  • [7] Robustness of inverse treatment planning with quadratic objective function
    Chvetsov, A
    Calvetti, D
    Sohn, J
    Kinsella, T
    [J]. MEDICAL PHYSICS, 2004, 31 (06) : 1778 - 1778
  • [8] A multi-objective and hierarchical exploration tool for SoC performance estimation
    Biest, Alexis Vander
    Richard, Alienor
    Milojevic, Dragomir
    Robert, Frederic
    [J]. EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, PROCEEDINGS, 2008, 5114 : 85 - 95
  • [9] Explore Locally, Plan Globally: A Path Planning Framework for Autonomous Robotic Exploration in Subterranean Environments
    Tung Dang
    Khattak, Shehryar
    Mascarich, Frank
    Alexis, Kostas
    [J]. 2019 19TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR), 2019, : 9 - 16
  • [10] MULTIPLE-OBJECTIVE DECISION-MAKING MODELS AS AN EXPLORATION PLANNING TOOL
    SHULMAN, MJ
    [J]. TRANSACTIONS OF THE INSTITUTION OF MINING AND METALLURGY SECTION B-APPLIED EARTH SCIENCE, 1990, 99 : B43 - B51