Inverse Reinforcement Learning with the Average Reward Criterion

被引:0
|
作者
Wu, Feiyang [1 ]
Ke, Jingyang [1 ]
Wu, Anqi [1 ]
机构
[1] Georgia Inst Technol, Coll Comp, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of Inverse Reinforcement Learning (IRL) with an average-reward criterion. The goal is to recover an unknown policy and a reward function when the agent only has samples of states and actions from an experienced agent. Previous IRL methods assume that the expert is trained in a discounted environment, and the discount factor is known. This work alleviates this assumption by proposing an average-reward framework with efficient learning algorithms. We develop novel stochastic first-order methods to solve the IRL problem under the average-reward setting, which requires solving an Average-reward Markov Decision Process (AMDP) as a subproblem. To solve the subproblem, we develop a Stochastic Policy Mirror Descent (SPMD) method under general state and action spaces that needs O(1/") steps of gradient computation. Equipped with SPMD, we propose the Inverse Policy Mirror Descent (IPMD) method for solving the IRL problem with a O(1/"2) complexity. To the best of our knowledge, the aforementioned complexity results are new in IRL literature with the average reward criterion. Finally, we corroborate our analysis with numerical experiments using the MuJoCo benchmark and additional control tasks.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Full Gradient Deep Reinforcement Learning for Average-Reward Criterion
    Pagare, Tejas
    Borkar, Vivek
    Avrachenkov, Konstantin
    [J]. LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [2] On-Policy Deep Reinforcement Learning for the Average-Reward Criterion
    Zhang, Yiming
    Ross, Keith W.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [3] Hierarchical average reward reinforcement learning
    Ghavamzadeh, Mohammad
    Mahadevan, Sridhar
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 2629 - 2669
  • [4] Hierarchical average reward reinforcement learning
    Department of Computing Science, University of Alberta, Edmonton, Alta. T6G 2E8, Canada
    不详
    [J]. Journal of Machine Learning Research, 2007, 8 : 2629 - 2669
  • [5] Compatible Reward Inverse Reinforcement Learning
    Metelli, Alberto Maria
    Pirotta, Matteo
    Restelli, Marcello
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [6] Reward Identification in Inverse Reinforcement Learning
    Kim, Kuno
    Garg, Shivam
    Shiragur, Kirankumar
    Ermon, Stefano
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [7] Learning classifier system with average reward reinforcement learning
    Zang, Zhaoxiang
    Li, Dehua
    Wang, Junying
    Xia, Dan
    [J]. KNOWLEDGE-BASED SYSTEMS, 2013, 40 : 58 - 71
  • [8] Robust Average-Reward Reinforcement Learning
    Wang, Yue
    Velasquez, Alvaro
    Atia, George
    Prater-Bennette, Ashley
    Zou, Shaofeng
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 80 : 719 - 803
  • [9] Robust Average-Reward Reinforcement Learning
    Wang, Yue
    Velasquez, Alvaro
    Atia, George
    Prater-Bennette, Ashley
    Zou, Shaofeng
    [J]. Journal of Artificial Intelligence Research, 2024, 80 : 719 - 803
  • [10] Active Learning for Reward Estimation in Inverse Reinforcement Learning
    Lopes, Manuel
    Melo, Francisco
    Montesano, Luis
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 31 - +