Training and Evaluation of Deep Policies Using Reinforcement Learning and Generative Models

被引:0
|
作者
Ghadirzadeh, Ali [1 ]
Poklukar, Petra [2 ]
Arndt, Karol [3 ]
Finn, Chelsea [1 ]
Kyrki, Ville [3 ]
Kragic, Danica [2 ]
Bjorkman, Marten [2 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] KTH Royal Inst Technol, Stockholm, Sweden
[3] Aalto Univ, Espoo, Finland
关键词
reinforcement learning; policy search; robot learning; deep generative models; representation learning; PRIMITIVES;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable genera-tive models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basket-ball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods.
引用
收藏
页数:37
相关论文
共 50 条
  • [1] Training and Evaluation of Deep Policies Using Reinforcement Learning and Generative Models
    Ghadirzadeh, Ali
    Poklukar, Petra
    Arndt, Karol
    Finn, Chelsea
    Kyrki, Ville
    Kragic, Danica
    Björkman, Mårten
    Journal of Machine Learning Research, 2022, 23
  • [2] Learning Urban Driving Policies using Deep Reinforcement Learning
    Agarwal, Tanmay
    Arora, Hitesh
    Schneider, Jeff
    2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 607 - 614
  • [3] Distributed Training for Deep Learning Models On An Edge Computing Network Using Shielded Reinforcement Learning
    Sen, Tanmoy
    Shen, Haiying
    2022 IEEE 42ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2022), 2022, : 581 - 591
  • [4] Integrating Multiple Policies for Person-Following Robot Training Using Deep Reinforcement Learning
    Dewa, Chandra Kusuma
    Miura, Jun
    IEEE ACCESS, 2021, 9 : 75526 - 75541
  • [5] Learning Deep Generative Models
    Salakhutdinov, Ruslan
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 2, 2015, 2 : 361 - 385
  • [6] Enhancing Deep Reinforcement Learning: A Tutorial on Generative Diffusion Models in Network Optimization
    Du, Hongyang
    Zhang, Ruichen
    Liu, Yinqiu
    Wang, Jiacheng
    Lin, Yijing
    Li, Zonghang
    Niyato, Dusit
    Kang, Jiawen
    Xiong, Zehui
    Cui, Shuguang
    Ai, Bo
    Zhou, Haibo
    Kim, Dong In
    IEEE Communications Surveys and Tutorials, 2024, 26 (04): : 2611 - 2646
  • [7] De Novo Drug Design Using Reinforcement Learning with Graph- Based Deep Generative Models
    Atance, Sara Romeo
    Diez, Juan Viguera
    Engkvist, Ola
    Olsson, Simon
    Mercado, Rocio
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (20) : 4863 - 4872
  • [8] On Training Flexible Robots using Deep Reinforcement Learning
    Dwiel, Zach
    Candadai, Madhavun
    Phielipp, Mariano
    2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 4666 - 4671
  • [9] Deep Predictive Policy Training using Reinforcement Learning
    Ghadirzadeh, Ali
    Maki, Atsuto
    Kragic, Danica
    Bjorkman, Marten
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 2351 - 2358
  • [10] Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models
    Wu, Yuchen
    Mozifian, Melissa
    Shkurti, Florian
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 6628 - 6634