Inverse Reinforcement Learning with the Average Reward Criterion

被引：0

作者：

Wu, Feiyang ^{[1
]}

Ke, Jingyang ^{[1
]}

Wu, Anqi ^{[1
]}

机构：

[1] Georgia Inst Technol, Coll Comp, Sch Computat Sci & Engn, Atlanta, GA 30332 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the problem of Inverse Reinforcement Learning (IRL) with an average-reward criterion. The goal is to recover an unknown policy and a reward function when the agent only has samples of states and actions from an experienced agent. Previous IRL methods assume that the expert is trained in a discounted environment, and the discount factor is known. This work alleviates this assumption by proposing an average-reward framework with efficient learning algorithms. We develop novel stochastic first-order methods to solve the IRL problem under the average-reward setting, which requires solving an Average-reward Markov Decision Process (AMDP) as a subproblem. To solve the subproblem, we develop a Stochastic Policy Mirror Descent (SPMD) method under general state and action spaces that needs O(1/") steps of gradient computation. Equipped with SPMD, we propose the Inverse Policy Mirror Descent (IPMD) method for solving the IRL problem with a O(1/"2) complexity. To the best of our knowledge, the aforementioned complexity results are new in IRL literature with the average reward criterion. Finally, we corroborate our analysis with numerical experiments using the MuJoCo benchmark and additional control tasks.

引用

页数：13

共 50 条

[31] Average Reward Reinforcement Learning for Semi-Markov Decision Processes
Yang, Jiayuan
Li, Yanjie
Chen, Haoyao
Li, Jiangang
[J]. NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 768 - 777
[32] Adaptive aggregation for reinforcement learning in average reward Markov decision processes
Ronald Ortner
[J]. Annals of Operations Research, 2013, 208 : 321 - 336
[33] Adaptive aggregation for reinforcement learning in average reward Markov decision processes
Ortner, Ronald
[J]. ANNALS OF OPERATIONS RESEARCH, 2013, 208 (01) : 321 - 336
[34] Risk-sensitive reinforcement learning algorithms with generalized average criterion
Yin Chang-ming
Wang Han-xing
Zhao Fei
[J]. APPLIED MATHEMATICS AND MECHANICS-ENGLISH EDITION, 2007, 28 (03) : 405 - 416
[35] Risk-sensitive reinforcement learning algorithms with generalized average criterion
殷苌茗
王汉兴
赵飞
[J]. Applied Mathematics and Mechanics(English Edition), 2007, (03) : 405 - 416
[36] Risk-sensitive reinforcement learning algorithms with generalized average criterion
Chang-ming Yin
Wang Han-xing
Zhao Fei
[J]. Applied Mathematics and Mechanics, 2007, 28 : 405 - 416
[37] A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines
Zhou, Weichao
Li, Wenchao
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[38] An Average-Reward Reinforcement Learning Algorithm based on Schweitzer's Transformation
Li Jianjun
Ren Jiangong
Li Yanjie
[J]. PROCEEDINGS OF THE 31ST CHINESE CONTROL CONFERENCE, 2012, : 2966 - 2970
[39] Average Reward Reinforcement Learning for Optimal On-route Charging of Electric Buses
Chen, Wenzhuo
Liang, Hao
[J]. 2020 IEEE 92ND VEHICULAR TECHNOLOGY CONFERENCE (VTC2020-FALL), 2020,
[40] Fuzzy decision processes with an average reward criterion
Kurano, M
Yasuda, M
Nakagami, JI
Yoshida, Y
[J]. MATHEMATICAL AND COMPUTER MODELLING, 1999, 30 (7-8) : 7 - 20

← 1 2 3 4 5 →