Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

被引:0
|
作者
Liu, Zhihan [1 ]
Lu, Miao [2 ]
Xiong, Wei [3 ]
Zhong, Han [4 ]
Hu, Hao [5 ]
Zhang, Shenao [1 ]
Zheng, Sirui [1 ]
Yang, Zhuoran [6 ]
Wang, Zhaoran [1 ]
机构
[1] Northwestern Univ, Evanston, IL 60208 USA
[2] Stanford Univ, Stanford, CA 94305 USA
[3] Univ Illinois, Urbana, IL USA
[4] Peking Univ, Beijing, Peoples R China
[5] Tsinghua Univ, Beijing, Peoples R China
[6] Yale Univ, New Haven, CT 06520 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In reinforcement learning (RL), balancing exploration and exploitation is crucial for achieving an optimal policy in a sample-efficient way. To this end, existing sample-efficient algorithms typically consist of three components: estimation, planning, and exploration. However, to cope with general function approximators, most of them involve impractical algorithmic components to incentivize exploration, such as data-dependent level-set constraints or complicated sampling procedures. To address this challenge, we propose an easy-to-implement RL framework called Maximize to Explore (MEX), which only needs to optimize unconstrainedly a single objective that integrates the estimation and planning components while balancing exploration and exploitation automatically. Theoretically, we prove that the MEX achieves a sublinear regret with general function approximators and is extendable to the zero-sum Markov game setting. Meanwhile, we adapt deep RL baselines to design practical versions of MEX in both the model-based and model-free settings, which outperform baselines in various MuJoCo environments with sparse reward by a stable margin. Compared with existing sample-efficient algorithms with general function approximators, MEX achieves similar sample efficiency while also enjoying a lower computational cost and is more compatible with modern deep RL methods. Our codes are available at https://github.com/agentification/MEX.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] An Improved Search Algorithm Based on Safe Exploration for Optimization to Control Worsening of Objective Function Value
    Nakamura, Yuka
    Yoshikawa, Taiga
    Yamagiwa, Ayako
    Coto, Masayuki
    [J]. INDUSTRIAL ENGINEERING AND MANAGEMENT SYSTEMS, 2024, 23 (02): : 125 - 135
  • [42] THE IMPACT OF CPAP FOR ONE NIGHT ON OBJECTIVE AND SUBJECTIVE NEUROCOGNITIVE FUNCTION IN SLEEP APNEA
    Guo, M.
    Carusona, A.
    Matteis, P.
    Stickgold, R.
    Malhotra, A.
    Djonlagic, I
    [J]. SLEEP, 2012, 35 : A140 - A141
  • [43] Objective Measures of Swallowing Function Applied to the Dysphagia Population: A One Year Experience
    Kendall, Katherine A.
    Ellerston, Julia
    Heller, Amanda
    Houtz, Daniel R.
    Zhang, Chong
    Presson, Angela P.
    [J]. DYSPHAGIA, 2016, 31 (04) : 538 - 546
  • [44] Objective Measures of Swallowing Function Applied to the Dysphagia Population: A One Year Experience
    Katherine A. Kendall
    Julia Ellerston
    Amanda Heller
    Daniel R. Houtz
    Chong Zhang
    Angela P. Presson
    [J]. Dysphagia, 2016, 31 : 538 - 546
  • [45] Novel Objective Function Involving Integral Power Quality Criterion for Distributed Generation Planning
    Gusev, Sergey
    Oboskalov, Vladislav
    Valiev, Rustam
    Cherepanova, Maria
    Zicmane, Inga
    Berzina, Kristina
    Kuckovskis, Jevgenijs
    [J]. 2018 INTERNATIONAL CONFERENCE AND EXPOSITION ON ELECTRICAL AND POWER ENGINEERING (EPE), 2018, : 982 - 986
  • [46] Predicting objective function weights from patient anatomy in prostate IMRT treatment planning
    Lee, Taewoo
    Hammad, Muhannad
    Chan, Timothy C. Y.
    Craig, Tim
    Sharpe, Michael B.
    [J]. MEDICAL PHYSICS, 2013, 40 (12)
  • [47] Neuro-Fuzzy Guided Objective Function Parameter Optimization of Inverse Treatment Planning
    Jimenez, E. Cisternas
    Yin, F.
    [J]. MEDICAL PHYSICS, 2021, 48 (06)
  • [48] Parameter Exploration for Spectral Estimation of Speckle Imagery in Modulation Transfer Function Measurements
    Plummer, Philip J.
    Barnard, Kenneth J.
    Marciniak, Michael A.
    [J]. INFRARED IMAGING SYSTEMS: DESIGN, ANALYSIS, MODELING, AND TESTING XXX, 2019, 11001
  • [49] Predicting Objective Function Weights for IMRT Prostate Treatment Planning Using Patient Anatomy
    Lee, T.
    Hammad, M.
    Chan, T.
    Craig, T.
    Sharpe, M.
    [J]. MEDICAL PHYSICS, 2013, 40 (06)
  • [50] AN ALGORITHM FOR SOLVING A WATER-PRESSURE-CONTROL PLANNING PROBLEM WITH A NONDIFFERENTIABLE OBJECTIVE FUNCTION
    NISHIKAWA, Y
    UDO, A
    [J]. LECTURE NOTES IN ECONOMICS AND MATHEMATICAL SYSTEMS, 1985, 255 : 323 - 331