Average Reward Optimization with Multiple Discounting Reinforcement Learners

被引:2
|
作者
Reinke, Chris [1 ]
Uchibe, Eiji [1 ,2 ]
Doya, Kenji [1 ]
机构
[1] Okinawa Inst Sci & Technol, Onna, Okinawa 9040495, Japan
[2] ATR Computat Neurosci Labs, Kyoto 6190288, Japan
关键词
Reinforcement learning; Average reward; Model-free; Value-based; Q-learning; Modular;
D O I
10.1007/978-3-319-70087-8_81
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Maximization of average reward is a major goal in reinforcement learning. Existing model-free, value-based algorithms such as R-Learning use average adjusted values. We propose a different framework, the Average Reward Independent Gamma Ensemble (AR-IGE). It is based on an ensemble of discounting Q-learning modules with a different discount factor for each module. Existing algorithms only learn the optimal policy and its average reward. In contrast, the AR-IGE learns different policies and their resulting average rewards. We prove the optimality of the AR-IGE in episodic and deterministic problems where rewards are given at several goal states. Furthermore, we show that the AR-IGE outperforms existing algorithms in such problems, especially in situations where policies have to be changed due to changes in the task. The AR-IGE represents a new way to optimize average reward that could lead to further improvements in the field.
引用
收藏
页码:789 / 800
页数:12
相关论文
共 50 条
  • [41] Reward contrast in delay and probability discounting
    Dai, Zhijie
    Grace, Randolph C.
    Kemp, Simon
    [J]. LEARNING & BEHAVIOR, 2009, 37 (03) : 281 - 288
  • [42] The neural correlates of temporal reward discounting
    Scheres, Anouk
    de Water, Erik
    Mies, Gabry W.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COGNITIVE SCIENCE, 2013, 4 (05) : 523 - 545
  • [43] Genomic basis of delayed reward discounting
    Gray, Joshua C.
    Sanchez-Roige, Sandra
    de Wit, Harriet
    MacKillop, James
    Palmer, Abraham A.
    [J]. BEHAVIOURAL PROCESSES, 2019, 162 : 157 - 161
  • [44] Discounting of reward sequences: a test of competing formal models of hyperbolic discounting
    Zarr, Noah
    Alexander, William H.
    Brown, Joshua W.
    [J]. FRONTIERS IN PSYCHOLOGY, 2014, 5
  • [45] Average number of events and average reward
    Mi, J
    [J]. PROBABILITY IN THE ENGINEERING AND INFORMATIONAL SCIENCES, 2000, 14 (04) : 485 - 510
  • [46] Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs
    Duc Thien Nguyen
    Yeoh, William
    Lau, Hoong Chuin
    Zilberstein, Shlomo
    Zhang, Chongjie
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 1447 - 1455
  • [47] An average-reward reinforcement learning algorithm for computing bias-optimal policies
    Mahadevan, S
    [J]. PROCEEDINGS OF THE THIRTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE, VOLS 1 AND 2, 1996, : 875 - 880
  • [48] Average reward rates enable motivational transfer across independent reinforcement learning tasks
    Aberg, Kristoffer C.
    Paz, Rony
    [J]. FRONTIERS IN BEHAVIORAL NEUROSCIENCE, 2022, 16
  • [49] Solving semi-Markov decision problems using average reward reinforcement learning
    Das, TK
    Gosavi, A
    Mahadevan, S
    Marchalleck, N
    [J]. MANAGEMENT SCIENCE, 1999, 45 (04) : 560 - 574
  • [50] Scaling model-based average-reward reinforcement learning for product delivery
    Proper, Scott
    Tadepalli, Prasad
    [J]. MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 735 - 742