Average Reward Optimization with Multiple Discounting Reinforcement Learners

被引:2
|
作者
Reinke, Chris [1 ]
Uchibe, Eiji [1 ,2 ]
Doya, Kenji [1 ]
机构
[1] Okinawa Inst Sci & Technol, Onna, Okinawa 9040495, Japan
[2] ATR Computat Neurosci Labs, Kyoto 6190288, Japan
关键词
Reinforcement learning; Average reward; Model-free; Value-based; Q-learning; Modular;
D O I
10.1007/978-3-319-70087-8_81
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Maximization of average reward is a major goal in reinforcement learning. Existing model-free, value-based algorithms such as R-Learning use average adjusted values. We propose a different framework, the Average Reward Independent Gamma Ensemble (AR-IGE). It is based on an ensemble of discounting Q-learning modules with a different discount factor for each module. Existing algorithms only learn the optimal policy and its average reward. In contrast, the AR-IGE learns different policies and their resulting average rewards. We prove the optimality of the AR-IGE in episodic and deterministic problems where rewards are given at several goal states. Furthermore, we show that the AR-IGE outperforms existing algorithms in such problems, especially in situations where policies have to be changed due to changes in the task. The AR-IGE represents a new way to optimize average reward that could lead to further improvements in the field.
引用
收藏
页码:789 / 800
页数:12
相关论文
共 50 条
  • [31] Parallel reinforcement learning using multiple reward signals
    Raicevic, Peter
    [J]. NEUROCOMPUTING, 2006, 69 (16-18) : 2171 - 2179
  • [32] An Average-Reward Reinforcement Learning Algorithm based on Schweitzer's Transformation
    Li Jianjun
    Ren Jiangong
    Li Yanjie
    [J]. PROCEEDINGS OF THE 31ST CHINESE CONTROL CONFERENCE, 2012, : 2966 - 2970
  • [33] Average Reward Reinforcement Learning for Optimal On-route Charging of Electric Buses
    Chen, Wenzhuo
    Liang, Hao
    [J]. 2020 IEEE 92ND VEHICULAR TECHNOLOGY CONFERENCE (VTC2020-FALL), 2020,
  • [34] Average reward adjusted deep reinforcement learning for order release planning in manufacturing
    Schneckenreither, Manuel
    Haeussler, Stefan
    Peiro, Juanjo
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 247
  • [35] Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward
    Qu, Guannan
    Lin, Yiheng
    Wierman, Adam
    Li, Na
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [36] RVI Reinforcement Learning for Semi-Markov Decision Processes with Average Reward
    Li, Yanjie
    Cao, Fang
    [J]. 2010 8TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2010, : 1674 - 1679
  • [37] Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms
    Murthy, Yashaswini
    Moharrami, Mehrdad
    Srikant, R.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [38] Discounting Future Reward in an Uncertain World
    Story, G. W.
    Kurth-Nelson, Z.
    Moutoussis, M.
    Iigaya, K.
    Will, G. -j.
    Hauser, T. U.
    Blain, B.
    Vlaev, I.
    Dolan, R. J.
    [J]. DECISION-WASHINGTON, 2023, : 255 - 282
  • [39] Control of movements and temporal discounting of reward
    Shadmehr, Reza
    [J]. CURRENT OPINION IN NEUROBIOLOGY, 2010, 20 (06) : 726 - 730
  • [40] Reward contrast in delay and probability discounting
    Zhijie Dai
    Randolph C. Grace
    Simon Kemp
    [J]. Learning & Behavior, 2009, 37 : 281 - 288