Hierarchical Adversarial Inverse Reinforcement Learning

被引:3
|
作者
Chen, Jiayu [1 ]
Lan, Tian [2 ]
Aggarwal, Vaneet [1 ,3 ]
机构
[1] Purdue Univ, Sch Ind Engn, W Lafayette, IN 47907 USA
[2] George Washington Univ, Dept Elect & Comp Engn, Washington, DC 20052 USA
[3] KAUST, Comp Sci Dept, Thuwal 23955, Saudi Arabia
关键词
Inverse reinforcement learning (IRL); hierarchical imitation learning (HIL); robotic learning;
D O I
10.1109/TNNLS.2023.3305983
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imitation learning (IL) has been proposed to recover the expert policy from demonstrations. However, it would be difficult to learn a single monolithic policy for highly complex long-horizon tasks of which the expert policy usually contains subtask hierarchies. Therefore, hierarchical IL (HIL) has been developed to learn a hierarchical policy from expert demonstrations through explicitly modeling the activity structure in a task with the option framework. Existing HIL methods either overlook the causal relationship between the subtask structure and the learned policy, or fail to learn the high-level and low-level policy in the hierarchical framework in conjuncture, which leads to suboptimality. In this work, we propose a novel HIL algorithm-hierarchical adversarial inverse reinforcement learning (H-AIRL), which extends a state-of-the-art (SOTA) IL algorithm-AIRL, with the one-step option framework. Specifically, we redefine the AIRL objectives on the extended state and action spaces, and further introduce a directed information term to the objective function to enhance the causality between the low-level policy and its corresponding subtask. Moreover, we propose an expectation-maximization (EM) adaption of our algorithm so that it can be applied to expert demonstrations without the subtask annotations which are more accessible in practice. Theoretical justifications of our algorithm design and evaluations on challenging robotic control tasks are provided to show the superiority of our algorithm compared with SOTA HIL baselines. The codes are available at https://github.com/LucasCJYSDL/HierAIRL.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [41] Hierarchical Mixtures of Generators for Adversarial Learning
    Ahmetoglu, Alper
    Alpaydin, Ethem
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 316 - 323
  • [42] Inverse reinforcement learning with evaluation
    da Silva, Valdinei Freire
    Reali Costa, Anna Helena
    Lima, Pedro
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), VOLS 1-10, 2006, : 4246 - +
  • [43] Identifiability in inverse reinforcement learning
    Cao, Haoyang
    Cohen, Samuel N.
    Szpruch, Lukasz
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [44] Concurrent Hierarchical Reinforcement Learning
    Marthi, Bhaskara
    Russell, Stuart
    Latham, David
    Guestrin, Carlos
    [J]. 19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 779 - 785
  • [45] Survey on Inverse Reinforcement Learning
    Zhang, Li-Hua
    Liu, Quan
    Huang, Zhi-Gang
    Zhu, Fei
    [J]. Ruan Jian Xue Bao/Journal of Software, 2023, 34 (10): : 4772 - 4803
  • [46] A survey of inverse reinforcement learning
    Adams, Stephen
    Cody, Tyler
    Beling, Peter A.
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (06) : 4307 - 4346
  • [47] A survey of inverse reinforcement learning
    Stephen Adams
    Tyler Cody
    Peter A. Beling
    [J]. Artificial Intelligence Review, 2022, 55 : 4307 - 4346
  • [48] Hierarchical reinforcement learning with OMQ
    Shen, Jing
    Liu, Haibo
    Gu, Guochang
    [J]. PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS, VOLS 1 AND 2, 2006, : 584 - 588
  • [49] Hierarchical Imitation and Reinforcement Learning
    Le, Hoang M.
    Jiang, Nan
    Agarwal, Alekh
    Dudik, Miroslav
    Yue, Yisong
    Daume, Hal, III
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [50] On Efficiency in Hierarchical Reinforcement Learning
    Wen, Zheng
    Precup, Doina
    Ibrahimi, Morteza
    Barreto, Andre
    Van Roy, Benjamin
    Singh, Satinder
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33