RAIL: Risk-Averse Imitation Learning Extended Abstract

被引:0
|
作者
Santara, Anirban [1 ,2 ]
Naik, Abhishek [1 ,3 ,4 ]
Ravindran, Balaraman [3 ,4 ]
Das, Dipankar [5 ]
Mudigere, Dheevatsa [5 ]
Avancha, Sasikanth [5 ]
Kaul, Bharat [5 ]
机构
[1] Intel Labs, Bangalore, Karnataka, India
[2] Indian Inst Technol Kharagpur, Kharagpur, W Bengal, India
[3] Indian Inst Technol Madras, Dept CSE, Madras, Tamil Nadu, India
[4] Indian Inst Technol Madras, Robert Bosch Ctr Data Sci & AI, Madras, Tamil Nadu, India
[5] Intel Labs, Parallel Comp Lab, Bangalore, Karnataka, India
关键词
Reinforcement Learning; Imitation Learning; Risk Minimization; Conditional-Value-at-Risk; Reliability;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imitation learning algorithms learn viable policies by imitating an expert's behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert's behavior is available as a fixed set of trajectories. We evaluate in terms of the expert's cost function and observe that the distribution of trajectory-costs is often more heavy-tailed for GAIL-agents than the expert at a number of benchmark continuous-control tasks. Thus, high-cost trajectories, corresponding to tail-end events of catastrophic failure, are more likely to be encountered by the GAIL-agents than the expert. This makes the reliability of GAIL-agents questionable when it comes to deployment in risk-sensitive applications like robotic surgery and autonomous driving. In this work, we aim to minimize the occurrence of tail-end events by minimizing tail risk within the GAIL framework. We quantify tail risk by the Conditional-Value-at-Risk (CVaR) of trajectories and develop the Risk-Averse Imitation Learning (RAIL) algorithm. We observe that the policies learned with RAIL show lower tail-end risk than those of vanilla GAIL. Thus, the proposed RAIL algorithm appears as a potent alternative to GAIL for improved reliability in risk-sensitive applications.
引用
收藏
页码:2062 / 2063
页数:2
相关论文
共 50 条
  • [41] Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning
    Queeney, James
    Benosman, Mouhacine
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] Risk-averse Heteroscedastic Bayesian Optimization
    Makarova, Anastasiia
    Usmanova, Ilnura
    Bogunovic, Ilija
    Krause, Andreas
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [43] Risk-Averse Stochastic Convex Bandit
    Cardoso, Adrian Rivera
    Xu, Huan
    [J]. 22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89 : 39 - 47
  • [44] Wardrop Equilibria with Risk-Averse Users
    Ordonez, Fernando
    Stier-Moses, Nicolas E.
    [J]. TRANSPORTATION SCIENCE, 2010, 44 (01) : 63 - 86
  • [45] Inventory centralization with risk-averse newsvendors
    Jiahua Zhang
    Shu-Cherng Fang
    Yifan Xu
    [J]. Annals of Operations Research, 2018, 268 : 215 - 237
  • [46] Risk-averse agents with peer pressure
    Daido, K
    [J]. APPLIED ECONOMICS LETTERS, 2004, 11 (06) : 383 - 386
  • [47] Inventory centralization with risk-averse newsvendors
    Zhang, Jiahua
    Fang, Shu-Cherng
    Xu, Yifan
    [J]. ANNALS OF OPERATIONS RESEARCH, 2018, 268 (1-2) : 215 - 237
  • [48] RISK-AVERSE ONLINE LEARNING UNDER MEAN-VARIANCE MEASURES
    Vakili, Sattar
    Zhao, Qing
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 1911 - 1915
  • [49] MENTAL HEALTH IN THE RISK-AVERSE SOCIETY
    Eberhard, D.
    [J]. JOURNAL OF PSYCHOPHARMACOLOGY, 2011, 25 (08) : A4 - A4
  • [50] Risk-Averse Planning Under Uncertainty
    Ahmadi, Mohamadreza
    Ono, Masahiro
    Ingham, Michel D.
    Murray, Richard M.
    Ames, Aaron D.
    [J]. 2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 3305 - 3312