Model-based average reward reinforcement learning

被引:36
|
作者
Tadepalli, P [1 ]
Ok, D
机构
[1] Oregon State Univ, Dept Comp Sci, Corvallis, OR 97331 USA
[2] Korean Army Comp Ctr, Chungnam 320919, South Korea
关键词
machine learning; Reinforcement Learning; average reward; model-based; exploration; Bayesian networks; linear regression; AGV scheduling;
D O I
10.1016/S0004-3702(98)00002-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement Learning (RL) is the study of programs that improve their performance by receiving rewards and punishments from the environment. Most RL methods optimize the discounted total reward received by an agent, while, in many domains, the natural criterion is to optimize the average reward per time step. In this paper, we introduce a model-based Average-reward Reinforcement Learning method called H-learning and show that it converges more quickly and robustly than its discounted counterpart in the domain of scheduling a simulated Automatic Guided Vehicle (AGV). We also introduce a version of H-learning that automatically explores the unexplored parts of the state space, while always choosing greedy actions with respect to the current value function. We show that this "Auto-exploratory H-Learning" performs better than the previously studied exploration strategies. To scale H-learning to larger state spaces, we extend it to learn action models and reward functions in the form of dynamic Bayesian networks, and approximate its value function using local linear regression. We show that both of these extensions are effective in significantly reducing the space requirement of H-learning and making it converge faster in some AGV scheduling tasks. (C) 1998 Published by Elsevier Science B.V.
引用
收藏
页码:177 / 224
页数:48
相关论文
共 50 条
  • [1] Scaling model-based average-reward reinforcement learning for product delivery
    Proper, Scott
    Tadepalli, Prasad
    [J]. MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 735 - 742
  • [2] Reward Shaping for Model-Based Bayesian Reinforcement Learning
    Kim, Hyeoneun
    Lim, Woosang
    Lee, Kanghoon
    Noh, Yung-Kyun
    Kim, Kee-Eung
    [J]. PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 3548 - 3555
  • [3] Reward-Respecting Subtasks for Model-Based Reinforcement Learning
    Sutton, Richard S.
    Machado, Marlos C.
    Holland, G. Zacharias
    Szepesvari, David
    Timbers, Finbarr
    Tanner, Brian
    White, Adam
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 20, 2024, : 22713 - 22713
  • [4] Reward-respecting subtasks for model-based reinforcement learning
    Suttona, Richard S.
    Machado, Marlos C.
    Holland, Zacharias
    Szepesvari, David
    Timbers, Finbarr
    Tanner, Brian
    White, Adam
    [J]. ARTIFICIAL INTELLIGENCE, 2023, 324
  • [5] An Analysis of Feature Selection and Reward Function for Model-Based Reinforcement Learning
    Shen, Shitian
    Lin, Chen
    Mostafavi, Behrooz
    Barnes, Tiffany
    Chi, Min
    [J]. INTELLIGENT TUTORING SYSTEMS, ITS 2016, 2016, 9684 : 504 - 505
  • [6] A Modified Average Reward Reinforcement Learning Based on Fuzzy Reward Function
    Zhai, Zhenkun
    Chen, Wei
    Li, Xiong
    Guo, Jing
    [J]. IMECS 2009: INTERNATIONAL MULTI-CONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2009, : 113 - 117
  • [7] Hierarchical average reward reinforcement learning
    Ghavamzadeh, Mohammad
    Mahadevan, Sridhar
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 2629 - 2669
  • [8] Hierarchical average reward reinforcement learning
    Department of Computing Science, University of Alberta, Edmonton, Alta. T6G 2E8, Canada
    不详
    [J]. Journal of Machine Learning Research, 2007, 8 : 2629 - 2669
  • [9] Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation
    Zhang, Weitong
    Zhou, Dongruo
    Gu, Quanquan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [10] A novel model-based reinforcement learning algorithm for solving the problem of unbalanced reward
    Yuan, Yinlong
    Hua, Liang
    Cheng, Yun
    Li, Junhong
    Sang, Xiaohu
    Zhang, Lei
    Wei, Wu
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 3233 - 3243