Average Reward Optimization with Multiple Discounting Reinforcement Learners

被引:2
|
作者
Reinke, Chris [1 ]
Uchibe, Eiji [1 ,2 ]
Doya, Kenji [1 ]
机构
[1] Okinawa Inst Sci & Technol, Onna, Okinawa 9040495, Japan
[2] ATR Computat Neurosci Labs, Kyoto 6190288, Japan
关键词
Reinforcement learning; Average reward; Model-free; Value-based; Q-learning; Modular;
D O I
10.1007/978-3-319-70087-8_81
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Maximization of average reward is a major goal in reinforcement learning. Existing model-free, value-based algorithms such as R-Learning use average adjusted values. We propose a different framework, the Average Reward Independent Gamma Ensemble (AR-IGE). It is based on an ensemble of discounting Q-learning modules with a different discount factor for each module. Existing algorithms only learn the optimal policy and its average reward. In contrast, the AR-IGE learns different policies and their resulting average rewards. We prove the optimality of the AR-IGE in episodic and deterministic problems where rewards are given at several goal states. Furthermore, we show that the AR-IGE outperforms existing algorithms in such problems, especially in situations where policies have to be changed due to changes in the task. The AR-IGE represents a new way to optimize average reward that could lead to further improvements in the field.
引用
收藏
页码:789 / 800
页数:12
相关论文
共 50 条
  • [1] General discounting versus average reward
    Hutter, Marcus
    [J]. ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2006, 4264 : 244 - 258
  • [2] Hierarchical average reward reinforcement learning
    Ghavamzadeh, Mohammad
    Mahadevan, Sridhar
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 2629 - 2669
  • [3] Hierarchical average reward reinforcement learning
    Department of Computing Science, University of Alberta, Edmonton, Alta. T6G 2E8, Canada
    不详
    [J]. Journal of Machine Learning Research, 2007, 8 : 2629 - 2669
  • [4] Robust Average-Reward Reinforcement Learning
    Wang, Yue
    Velasquez, Alvaro
    Atia, George
    Prater-Bennette, Ashley
    Zou, Shaofeng
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 80 : 719 - 803
  • [5] Robust Average-Reward Reinforcement Learning
    Wang, Yue
    Velasquez, Alvaro
    Atia, George
    Prater-Bennette, Ashley
    Zou, Shaofeng
    [J]. Journal of Artificial Intelligence Research, 2024, 80 : 719 - 803
  • [6] Inverse Reinforcement Learning with the Average Reward Criterion
    Wu, Feiyang
    Ke, Jingyang
    Wu, Anqi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] A Modified Average Reward Reinforcement Learning Based on Fuzzy Reward Function
    Zhai, Zhenkun
    Chen, Wei
    Li, Xiong
    Guo, Jing
    [J]. IMECS 2009: INTERNATIONAL MULTI-CONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2009, : 113 - 117
  • [8] Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance
    Knox, W. Bradley
    Stone, Peter
    [J]. ARTIFICIAL INTELLIGENCE, 2015, 225 : 24 - 50
  • [9] Maximizing the average reward in episodic reinforcement learning tasks
    Reinke, Chris
    Uchibe, Eiji
    Doya, Kenji
    [J]. 2015 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS), 2015, : 420 - 421
  • [10] Auto-exploratory average reward Reinforcement Learning
    Ok, D
    Tadepalli, P
    [J]. PROCEEDINGS OF THE THIRTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE, VOLS 1 AND 2, 1996, : 881 - 887