Inverse Reinforcement Learning with the Average Reward Criterion

被引:0
|
作者
Wu, Feiyang [1 ]
Ke, Jingyang [1 ]
Wu, Anqi [1 ]
机构
[1] Georgia Inst Technol, Coll Comp, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of Inverse Reinforcement Learning (IRL) with an average-reward criterion. The goal is to recover an unknown policy and a reward function when the agent only has samples of states and actions from an experienced agent. Previous IRL methods assume that the expert is trained in a discounted environment, and the discount factor is known. This work alleviates this assumption by proposing an average-reward framework with efficient learning algorithms. We develop novel stochastic first-order methods to solve the IRL problem under the average-reward setting, which requires solving an Average-reward Markov Decision Process (AMDP) as a subproblem. To solve the subproblem, we develop a Stochastic Policy Mirror Descent (SPMD) method under general state and action spaces that needs O(1/") steps of gradient computation. Equipped with SPMD, we propose the Inverse Policy Mirror Descent (IPMD) method for solving the IRL problem with a O(1/"2) complexity. To the best of our knowledge, the aforementioned complexity results are new in IRL literature with the average reward criterion. Finally, we corroborate our analysis with numerical experiments using the MuJoCo benchmark and additional control tasks.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Average Reward Reinforcement Learning for Semi-Markov Decision Processes
    Yang, Jiayuan
    Li, Yanjie
    Chen, Haoyao
    Li, Jiangang
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 768 - 777
  • [32] Adaptive aggregation for reinforcement learning in average reward Markov decision processes
    Ronald Ortner
    [J]. Annals of Operations Research, 2013, 208 : 321 - 336
  • [33] Adaptive aggregation for reinforcement learning in average reward Markov decision processes
    Ortner, Ronald
    [J]. ANNALS OF OPERATIONS RESEARCH, 2013, 208 (01) : 321 - 336
  • [34] Risk-sensitive reinforcement learning algorithms with generalized average criterion
    Yin Chang-ming
    Wang Han-xing
    Zhao Fei
    [J]. APPLIED MATHEMATICS AND MECHANICS-ENGLISH EDITION, 2007, 28 (03) : 405 - 416
  • [35] Risk-sensitive reinforcement learning algorithms with generalized average criterion
    殷苌茗
    王汉兴
    赵飞
    [J]. Applied Mathematics and Mechanics(English Edition), 2007, (03) : 405 - 416
  • [36] Risk-sensitive reinforcement learning algorithms with generalized average criterion
    Chang-ming Yin
    Wang Han-xing
    Zhao Fei
    [J]. Applied Mathematics and Mechanics, 2007, 28 : 405 - 416
  • [37] A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines
    Zhou, Weichao
    Li, Wenchao
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [38] An Average-Reward Reinforcement Learning Algorithm based on Schweitzer's Transformation
    Li Jianjun
    Ren Jiangong
    Li Yanjie
    [J]. PROCEEDINGS OF THE 31ST CHINESE CONTROL CONFERENCE, 2012, : 2966 - 2970
  • [39] Average Reward Reinforcement Learning for Optimal On-route Charging of Electric Buses
    Chen, Wenzhuo
    Liang, Hao
    [J]. 2020 IEEE 92ND VEHICULAR TECHNOLOGY CONFERENCE (VTC2020-FALL), 2020,
  • [40] Fuzzy decision processes with an average reward criterion
    Kurano, M
    Yasuda, M
    Nakagami, JI
    Yoshida, Y
    [J]. MATHEMATICAL AND COMPUTER MODELLING, 1999, 30 (7-8) : 7 - 20