RAIL: Risk-Averse Imitation Learning Extended Abstract

被引:0
|
作者
Santara, Anirban [1 ,2 ]
Naik, Abhishek [1 ,3 ,4 ]
Ravindran, Balaraman [3 ,4 ]
Das, Dipankar [5 ]
Mudigere, Dheevatsa [5 ]
Avancha, Sasikanth [5 ]
Kaul, Bharat [5 ]
机构
[1] Intel Labs, Bangalore, Karnataka, India
[2] Indian Inst Technol Kharagpur, Kharagpur, W Bengal, India
[3] Indian Inst Technol Madras, Dept CSE, Madras, Tamil Nadu, India
[4] Indian Inst Technol Madras, Robert Bosch Ctr Data Sci & AI, Madras, Tamil Nadu, India
[5] Intel Labs, Parallel Comp Lab, Bangalore, Karnataka, India
关键词
Reinforcement Learning; Imitation Learning; Risk Minimization; Conditional-Value-at-Risk; Reliability;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imitation learning algorithms learn viable policies by imitating an expert's behavior when reward signals are not available. Generative Adversarial Imitation Learning (GAIL) is a state-of-the-art algorithm for learning policies when the expert's behavior is available as a fixed set of trajectories. We evaluate in terms of the expert's cost function and observe that the distribution of trajectory-costs is often more heavy-tailed for GAIL-agents than the expert at a number of benchmark continuous-control tasks. Thus, high-cost trajectories, corresponding to tail-end events of catastrophic failure, are more likely to be encountered by the GAIL-agents than the expert. This makes the reliability of GAIL-agents questionable when it comes to deployment in risk-sensitive applications like robotic surgery and autonomous driving. In this work, we aim to minimize the occurrence of tail-end events by minimizing tail risk within the GAIL framework. We quantify tail risk by the Conditional-Value-at-Risk (CVaR) of trajectories and develop the Risk-Averse Imitation Learning (RAIL) algorithm. We observe that the policies learned with RAIL show lower tail-end risk than those of vanilla GAIL. Thus, the proposed RAIL algorithm appears as a potent alternative to GAIL for improved reliability in risk-sensitive applications.
引用
收藏
页码:2062 / 2063
页数:2
相关论文
共 50 条
  • [1] Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning (Extended Abstract)
    Ma, Xiaoteng
    Ma, Shuai
    Xia, Li
    Zhao, Qianchuan
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6925 - 6930
  • [2] Efficient Risk-Averse Reinforcement Learning
    Greenberg, Ido
    Chow, Yinlam
    Ghavamzadeh, Mohammad
    Mannor, Shie
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [3] Extended omega ratio optimization for risk-averse investors
    Sharma, Amita
    Mehra, Aparna
    [J]. INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH, 2017, 24 (03) : 485 - 506
  • [4] Adaptive Sampling for Stochastic Risk-Averse Learning
    Curi, Sebastian
    Levy, Kfir Y.
    Jegelka, Stefanie
    Krause, Andreas
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS (NEURIPS 2020), 2020, 33
  • [5] Risk-averse Reinforcement Learning for Algorithmic Trading
    Shen, Yun
    Huang, Ruihong
    Yan, Chang
    Obermayer, Klaus
    [J]. 2014 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR FINANCIAL ENGINEERING & ECONOMICS (CIFER), 2014, : 391 - 398
  • [6] Decision Variance in Risk-Averse Online Learning
    Vakili, Sattar
    Boukouvalas, Alexis
    Zhao, Qing
    [J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 2738 - 2744
  • [7] Risk-averse Reinforcement Learning for Portfolio Optimization
    Enkhsaikhan, Bayaraa
    Jo, Ohyun
    [J]. ICT EXPRESS, 2024, 10 (04): : 857 - 862
  • [8] ARE BANKS RISK-AVERSE?
    Nishiyama, Yasuo
    [J]. EASTERN ECONOMIC JOURNAL, 2007, 33 (04) : 471 - 490
  • [9] Risk-averse governments
    Paul G. Harris
    [J]. Nature Climate Change, 2014, 4 : 245 - 246
  • [10] Mission: Risk-Averse
    Matson, John
    [J]. SCIENTIFIC AMERICAN, 2013, 308 (03) : 88 - 88