Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

被引:0
|
作者
Liu, Zhihan [1 ]
Lu, Miao [2 ]
Xiong, Wei [3 ]
Zhong, Han [4 ]
Hu, Hao [5 ]
Zhang, Shenao [1 ]
Zheng, Sirui [1 ]
Yang, Zhuoran [6 ]
Wang, Zhaoran [1 ]
机构
[1] Northwestern Univ, Evanston, IL 60208 USA
[2] Stanford Univ, Stanford, CA 94305 USA
[3] Univ Illinois, Urbana, IL USA
[4] Peking Univ, Beijing, Peoples R China
[5] Tsinghua Univ, Beijing, Peoples R China
[6] Yale Univ, New Haven, CT 06520 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In reinforcement learning (RL), balancing exploration and exploitation is crucial for achieving an optimal policy in a sample-efficient way. To this end, existing sample-efficient algorithms typically consist of three components: estimation, planning, and exploration. However, to cope with general function approximators, most of them involve impractical algorithmic components to incentivize exploration, such as data-dependent level-set constraints or complicated sampling procedures. To address this challenge, we propose an easy-to-implement RL framework called Maximize to Explore (MEX), which only needs to optimize unconstrainedly a single objective that integrates the estimation and planning components while balancing exploration and exploitation automatically. Theoretically, we prove that the MEX achieves a sublinear regret with general function approximators and is extendable to the zero-sum Markov game setting. Meanwhile, we adapt deep RL baselines to design practical versions of MEX in both the model-based and model-free settings, which outperform baselines in various MuJoCo environments with sparse reward by a stable margin. Compared with existing sample-efficient algorithms with general function approximators, MEX achieves similar sample efficiency while also enjoying a lower computational cost and is more compatible with modern deep RL methods. Our codes are available at https://github.com/agentification/MEX.
引用
收藏
页数:15
相关论文
共 50 条