Model-based average reward reinforcement learning

被引：36

作者：

Tadepalli, P ^{[1
]}

Ok, D

机构：

[1] Oregon State Univ, Dept Comp Sci, Corvallis, OR 97331 USA

[2] Korean Army Comp Ctr, Chungnam 320919, South Korea

来源：

ARTIFICIAL INTELLIGENCE | 1998年 / 100卷 / 1-2期

关键词：

machine learning; Reinforcement Learning; average reward; model-based; exploration; Bayesian networks; linear regression; AGV scheduling;

D O I：

10.1016/S0004-3702(98)00002-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement Learning (RL) is the study of programs that improve their performance by receiving rewards and punishments from the environment. Most RL methods optimize the discounted total reward received by an agent, while, in many domains, the natural criterion is to optimize the average reward per time step. In this paper, we introduce a model-based Average-reward Reinforcement Learning method called H-learning and show that it converges more quickly and robustly than its discounted counterpart in the domain of scheduling a simulated Automatic Guided Vehicle (AGV). We also introduce a version of H-learning that automatically explores the unexplored parts of the state space, while always choosing greedy actions with respect to the current value function. We show that this "Auto-exploratory H-Learning" performs better than the previously studied exploration strategies. To scale H-learning to larger state spaces, we extend it to learn action models and reward functions in the form of dynamic Bayesian networks, and approximate its value function using local linear regression. We show that both of these extensions are effective in significantly reducing the space requirement of H-learning and making it converge faster in some AGV scheduling tasks. (C) 1998 Published by Elsevier Science B.V.

引用

页码：177 / 224

页数：48

共 50 条

[1] Scaling model-based average-reward reinforcement learning for product delivery
Proper, Scott
Tadepalli, Prasad
[J]. MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 735 - 742
[2] Reward Shaping for Model-Based Bayesian Reinforcement Learning
Kim, Hyeoneun
Lim, Woosang
Lee, Kanghoon
Noh, Yung-Kyun
Kim, Kee-Eung
[J]. PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 3548 - 3555
[3] Reward-Respecting Subtasks for Model-Based Reinforcement Learning
Sutton, Richard S.
Machado, Marlos C.
Holland, G. Zacharias
Szepesvari, David
Timbers, Finbarr
Tanner, Brian
White, Adam
[J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 20, 2024, : 22713 - 22713
[4] Reward-respecting subtasks for model-based reinforcement learning
Suttona, Richard S.
Machado, Marlos C.
Holland, Zacharias
Szepesvari, David
Timbers, Finbarr
Tanner, Brian
White, Adam
[J]. ARTIFICIAL INTELLIGENCE, 2023, 324
[5] An Analysis of Feature Selection and Reward Function for Model-Based Reinforcement Learning
Shen, Shitian
Lin, Chen
Mostafavi, Behrooz
Barnes, Tiffany
Chi, Min
[J]. INTELLIGENT TUTORING SYSTEMS, ITS 2016, 2016, 9684 : 504 - 505
[6] A Modified Average Reward Reinforcement Learning Based on Fuzzy Reward Function
Zhai, Zhenkun
Chen, Wei
Li, Xiong
Guo, Jing
[J]. IMECS 2009: INTERNATIONAL MULTI-CONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2009, : 113 - 117
[7] Hierarchical average reward reinforcement learning
Ghavamzadeh, Mohammad
Mahadevan, Sridhar
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 2629 - 2669
[8] Hierarchical average reward reinforcement learning
Department of Computing Science, University of Alberta, Edmonton, Alta. T6G 2E8, Canada
不详
[J]. Journal of Machine Learning Research, 2007, 8 : 2629 - 2669
[9] Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation
Zhang, Weitong
Zhou, Dongruo
Gu, Quanquan
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[10] A novel model-based reinforcement learning algorithm for solving the problem of unbalanced reward
Yuan, Yinlong
Hua, Liang
Cheng, Yun
Li, Junhong
Sang, Xiaohu
Zhang, Lei
Wei, Wu
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 3233 - 3243

← 1 2 3 4 5 →