Sensitivity-Based Inverse Reinforcement Learning

被引：0

作者：

Tao, Zhaorong ^{[1
]}

Chen, Zhichao ^{[1
]}

Li, Yanjie ^{[1
]}

机构：

[1] Harbin Inst Technol, Shenzhen Grad Sch, Shenzhen 518055, Peoples R China

来源：

2013 32ND CHINESE CONTROL CONFERENCE (CCC) | 2013年

关键词：

Performance sensitivity; Inverse Reinforcement Learning; Reward function;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Inverse reinforcement learning (IRL) is a process to obtain a potential reward function according to expert's behavior. Then the optimal control policy is generated though some optimization theory, such as reinforcement learning, so that we can implement the imitation for expert's behavior. In this paper, we consider the inverse reinforcement learning principle from the point of performance sensitivity analysis. After that, we propose a novel inverse reinforcement learning analytical framework by analyzing the performance difference formula between expert's policy and any other policies. This analytical framework extends the standard inverse reinforcement learning to the case that the reward function is related with both states and actions. At the same time, this framework provides a unified approach for IRL with the discount reward and the average reward in Markov decision process. Finally, the validity of corresponding results is verified under a grid world problem.

引用

页码：2856 / 2861

页数：6

共 11 条

[1] Abbeel P., 2008, P NEUTRAL INFORM PRO
[2] Boyd S., 2004, CONVEX OPTIMIZATION, VFirst, DOI DOI 10.1017/CBO9780511804441
[3] Cao Xi- Ren, 2007, STOCHASTIC LEARNING
[4] Kalman R., 1964, J BASIC ENG, V86
[5] Kolter J., 2007, NEURAL INFORM PROCES
[6] Martin P., 1994, MARKOV DECISION PROC
[7] Neu G., 2007, P UNC ART INT VANC C
[8] Ng A., 2000, P 7 INT C MACH LEARN
[9] Sutton S., 1998, REINFORCEMENT LEARNI
[10] Syed U., 2008, P NEUTRAL INFORM PRO

← 1 2 →