Training and Evaluation of Deep Policies Using Reinforcement Learning and Generative Models

被引:0
|
作者
Ghadirzadeh, Ali [1 ]
Poklukar, Petra [2 ]
Arndt, Karol [3 ]
Finn, Chelsea [1 ]
Kyrki, Ville [3 ]
Kragic, Danica [2 ]
Bjorkman, Marten [2 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] KTH Royal Inst Technol, Stockholm, Sweden
[3] Aalto Univ, Espoo, Finland
关键词
reinforcement learning; policy search; robot learning; deep generative models; representation learning; PRIMITIVES;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable genera-tive models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basket-ball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods.
引用
收藏
页数:37
相关论文
共 50 条
  • [41] Simultaneously Evolving Deep Reinforcement Learning Models using Multifactorial Optimization
    Martinez, Aritz D.
    Osaba, Eneko
    Del Ser, Javier
    Herrera, Francisco
    2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [42] Semi-Supervised Learning from Crowds Using Deep Generative Models
    Atarashi, Kyohei
    Oyama, Satoshi
    Kurihara, Masahito
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 1555 - 1562
  • [43] Performance Analysis of Reinforcement Learning Techniques for Augmented Experience Training Using Generative Adversarial Networks
    Mahajan, Smita
    Patil, Shruti
    Bhavnagri, Moinuddin
    Singh, Rashmi
    Kalra, Kshitiz
    Saini, Bhumika
    Kotecha, Ketan
    Saini, Jatinderkumar
    APPLIED SCIENCES-BASEL, 2022, 12 (24):
  • [44] Learning positioning policies for mobile manipulation operations with deep reinforcement learning
    Ander Iriondo
    Elena Lazkano
    Ander Ansuategi
    Andoni Rivera
    Iker Lluvia
    Carlos Tubío
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 3003 - 3023
  • [45] Learning positioning policies for mobile manipulation operations with deep reinforcement learning
    Iriondo, Ander
    Lazkano, Elena
    Ansuategi, Ander
    Rivera, Andoni
    Lluvia, Iker
    Tubio, Carlos
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (09) : 3003 - 3023
  • [46] Deep reinforcement learning for comprehensive route optimization in elastic optical networks using generative strategies
    Renjith, P. N.
    Sujatha, G.
    Vinoth, M.
    Vignesh, G. D.
    Prabhu, M. Ramkumar
    Mouleswararao, B.
    OPTICAL AND QUANTUM ELECTRONICS, 2023, 55 (13)
  • [47] Using Deep Reinforcement Learning to Learn High-Level Policies on the ATRIAS Biped
    Li, Tianyu
    Geyer, Hartmut
    Atkeson, Christopher G.
    Rai, Akshara
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 263 - 269
  • [48] Adaptable control policies for variable liquid chromatography columns using deep reinforcement learning
    Andersson, David
    Edlund, Christoffer
    Corbett, Brandon
    Sjogren, Rickard
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [49] Adaptable control policies for variable liquid chromatography columns using deep reinforcement learning
    David Andersson
    Christoffer Edlund
    Brandon Corbett
    Rickard Sjögren
    Scientific Reports, 13
  • [50] Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning
    Xie, Yuansheng
    Vosoughi, Soroush
    Hassanpour, Saeed
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 5067 - 5074