Distributionally Robust Imitation Learning

被引:0
|
作者
Bashiri, Mohammad Ali [1 ]
Ziebart, Brian D. [1 ]
Zhang, Xinhua [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
基金
美国国家科学基金会;
关键词
OPTIMIZATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the imitation learning problem of learning a policy in a Markov Decision Process (MDP) setting where the reward function is not given, but demonstrations from experts are available. Although the goal of imitation learning is to learn a policy that produces behaviors nearly as good as the experts' for a desired task, assumptions of consistent optimality for demonstrated behaviors are often violated in practice. Finding a policy that is distributionally robust against noisy demonstrations based on an adversarial construction potentially solves this problem by avoiding optimistic generalizations of the demonstrated data. This paper studies Distributionally Robust Imitation Learning ( DROIL) and establishes a close connection between DROIL and Maximum Entropy Inverse Reinforcement Learning. We show that DROIL can be seen as a framework that maximizes a generalized concept of entropy. We develop a novel approach to transform the objective function into a convex optimization problem over a polynomial number of variables for a class of loss functions that are additive over state and action spaces. Our approach lets us optimize both stationary and non-stationary policies and, unlike prevalent previous methods, it does not require repeatedly solving an inner reinforcement learning problem. We experimentally show the significant benefits of DROIL's new optimization method on synthetic data and a highway driving environment.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Distributionally Robust Behavioral Cloning for Robust Imitation Learning
    Panaganti, Kishan
    Xu, Zaiyan
    Kalathil, Dileep
    Ghavamzadeh, Mohammad
    [J]. 2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 1342 - 1347
  • [2] Distributionally Robust Q-Learning
    Liu, Zijian
    Bai, Qinxun
    Blanchet, Jose
    Dong, Perry
    Xu, Wei
    Zhou, Zhengqing
    Zhou, Zhengyuan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [3] Efficient Generalization with Distributionally Robust Learning
    Ghosh, Soumyadip
    Squillante, Mark S.
    Wollega, Ebisa D.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] Does Distributionally Robust Supervised Learning Give Robust Classifiers?
    Hu, Weihua
    Niu, Gang
    Sato, Issei
    Sugiyama, Masashi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [5] Distributionally Robust Learning With Stable Adversarial Training
    Liu, Jiashuo
    Shen, Zheyan
    Cui, Peng
    Zhou, Linjun
    Kuang, Kun
    Li, Bo
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 11288 - 11300
  • [6] A Robust Learning Approach for Regression Models Based on Distributionally Robust Optimization
    Chen, Ruidi
    Paschalidis, Ioannis Ch.
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19
  • [7] Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
    Kallus, Nathan
    Mao, Xiaojie
    Wang, Kaiwen
    Zhou, Zhengyuan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 10598 - 10632
  • [8] Distributionally Robust Edge Learning with Dirichlet Process Prior
    Zhang, Zhaofeng
    Chen, Yue
    Zhang, Junshan
    [J]. 2020 IEEE 40TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2020, : 798 - 808
  • [9] Distributionally Robust Skeleton Learning of Discrete Bayesian Networks
    Li, Yeshu
    Ziebart, Brian D.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] Distributionally Robust Federated Learning for Mobile Edge Networks
    Le, Long Tan
    Nguyen, Tung-Anh
    Nguyen, Tuan-Dung
    Tran, Nguyen H.
    Truong, Nguyen Binh
    Vo, Phuong L.
    Hung, Bui Thanh
    Le, Tuan Anh
    [J]. MOBILE NETWORKS & APPLICATIONS, 2024, 29 (1): : 262 - 272