Inverse Reinforcement Learning with the Average Reward Criterion

被引：0

作者：

Wu, Feiyang ^{[1
]}

Ke, Jingyang ^{[1
]}

Wu, Anqi ^{[1
]}

机构：

[1] Georgia Inst Technol, Coll Comp, Sch Computat Sci & Engn, Atlanta, GA 30332 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the problem of Inverse Reinforcement Learning (IRL) with an average-reward criterion. The goal is to recover an unknown policy and a reward function when the agent only has samples of states and actions from an experienced agent. Previous IRL methods assume that the expert is trained in a discounted environment, and the discount factor is known. This work alleviates this assumption by proposing an average-reward framework with efficient learning algorithms. We develop novel stochastic first-order methods to solve the IRL problem under the average-reward setting, which requires solving an Average-reward Markov Decision Process (AMDP) as a subproblem. To solve the subproblem, we develop a Stochastic Policy Mirror Descent (SPMD) method under general state and action spaces that needs O(1/") steps of gradient computation. Equipped with SPMD, we propose the Inverse Policy Mirror Descent (IPMD) method for solving the IRL problem with a O(1/"2) complexity. To the best of our knowledge, the aforementioned complexity results are new in IRL literature with the average reward criterion. Finally, we corroborate our analysis with numerical experiments using the MuJoCo benchmark and additional control tasks.

引用

页数：13

共 50 条

[1] Full Gradient Deep Reinforcement Learning for Average-Reward Criterion
Pagare, Tejas
Borkar, Vivek
Avrachenkov, Konstantin
[J]. LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
[2] On-Policy Deep Reinforcement Learning for the Average-Reward Criterion
Zhang, Yiming
Ross, Keith W.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[3] Hierarchical average reward reinforcement learning
Ghavamzadeh, Mohammad
Mahadevan, Sridhar
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 2629 - 2669
[4] Hierarchical average reward reinforcement learning
Department of Computing Science, University of Alberta, Edmonton, Alta. T6G 2E8, Canada
不详
[J]. Journal of Machine Learning Research, 2007, 8 : 2629 - 2669
[5] Compatible Reward Inverse Reinforcement Learning
Metelli, Alberto Maria
Pirotta, Matteo
Restelli, Marcello
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[6] Reward Identification in Inverse Reinforcement Learning
Kim, Kuno
Garg, Shivam
Shiragur, Kirankumar
Ermon, Stefano
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[7] Learning classifier system with average reward reinforcement learning
Zang, Zhaoxiang
Li, Dehua
Wang, Junying
Xia, Dan
[J]. KNOWLEDGE-BASED SYSTEMS, 2013, 40 : 58 - 71
[8] Robust Average-Reward Reinforcement Learning
Wang, Yue
Velasquez, Alvaro
Atia, George
Prater-Bennette, Ashley
Zou, Shaofeng
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 80 : 719 - 803
[9] Robust Average-Reward Reinforcement Learning
Wang, Yue
Velasquez, Alvaro
Atia, George
Prater-Bennette, Ashley
Zou, Shaofeng
[J]. Journal of Artificial Intelligence Research, 2024, 80 : 719 - 803
[10] Active Learning for Reward Estimation in Inverse Reinforcement Learning
Lopes, Manuel
Melo, Francisco
Montesano, Luis
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 31 - +

← 1 2 3 4 5 →