Average Reward Optimization with Multiple Discounting Reinforcement Learners

被引：2

作者：

Reinke, Chris ^{[1
]}

Uchibe, Eiji ^{[1
,2
]}

Doya, Kenji ^{[1
]}

机构：

[1] Okinawa Inst Sci & Technol, Onna, Okinawa 9040495, Japan

[2] ATR Computat Neurosci Labs, Kyoto 6190288, Japan

来源：

NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I | 2017年 / 10634卷

关键词：

Reinforcement learning; Average reward; Model-free; Value-based; Q-learning; Modular;

D O I：

10.1007/978-3-319-70087-8_81

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Maximization of average reward is a major goal in reinforcement learning. Existing model-free, value-based algorithms such as R-Learning use average adjusted values. We propose a different framework, the Average Reward Independent Gamma Ensemble (AR-IGE). It is based on an ensemble of discounting Q-learning modules with a different discount factor for each module. Existing algorithms only learn the optimal policy and its average reward. In contrast, the AR-IGE learns different policies and their resulting average rewards. We prove the optimality of the AR-IGE in episodic and deterministic problems where rewards are given at several goal states. Furthermore, we show that the AR-IGE outperforms existing algorithms in such problems, especially in situations where policies have to be changed due to changes in the task. The AR-IGE represents a new way to optimize average reward that could lead to further improvements in the field.

引用

页码：789 / 800

页数：12

共 50 条

[1] General discounting versus average reward
Hutter, Marcus
[J]. ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2006, 4264 : 244 - 258
[2] Hierarchical average reward reinforcement learning
Ghavamzadeh, Mohammad
Mahadevan, Sridhar
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 2629 - 2669
[3] Hierarchical average reward reinforcement learning
Department of Computing Science, University of Alberta, Edmonton, Alta. T6G 2E8, Canada
不详
[J]. Journal of Machine Learning Research, 2007, 8 : 2629 - 2669
[4] Robust Average-Reward Reinforcement Learning
Wang, Yue
Velasquez, Alvaro
Atia, George
Prater-Bennette, Ashley
Zou, Shaofeng
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 80 : 719 - 803
[5] Robust Average-Reward Reinforcement Learning
Wang, Yue
Velasquez, Alvaro
Atia, George
Prater-Bennette, Ashley
Zou, Shaofeng
[J]. Journal of Artificial Intelligence Research, 2024, 80 : 719 - 803
[6] Inverse Reinforcement Learning with the Average Reward Criterion
Wu, Feiyang
Ke, Jingyang
Wu, Anqi
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[7] A Modified Average Reward Reinforcement Learning Based on Fuzzy Reward Function
Zhai, Zhenkun
Chen, Wei
Li, Xiong
Guo, Jing
[J]. IMECS 2009: INTERNATIONAL MULTI-CONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2009, : 113 - 117
[8] Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance
Knox, W. Bradley
Stone, Peter
[J]. ARTIFICIAL INTELLIGENCE, 2015, 225 : 24 - 50
[9] Maximizing the average reward in episodic reinforcement learning tasks
Reinke, Chris
Uchibe, Eiji
Doya, Kenji
[J]. 2015 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS), 2015, : 420 - 421
[10] Auto-exploratory average reward Reinforcement Learning
Ok, D
Tadepalli, P
[J]. PROCEEDINGS OF THE THIRTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE, VOLS 1 AND 2, 1996, : 881 - 887

← 1 2 3 4 5 →