Training and Evaluation of Deep Policies Using Reinforcement Learning and Generative Models

被引:0
|
作者
Ghadirzadeh, Ali [1 ]
Poklukar, Petra [2 ]
Arndt, Karol [3 ]
Finn, Chelsea [1 ]
Kyrki, Ville [3 ]
Kragic, Danica [2 ]
Bjorkman, Marten [2 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] KTH Royal Inst Technol, Stockholm, Sweden
[3] Aalto Univ, Espoo, Finland
关键词
reinforcement learning; policy search; robot learning; deep generative models; representation learning; PRIMITIVES;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable genera-tive models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basket-ball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods.
引用
收藏
页数:37
相关论文
共 50 条
  • [31] Example-guided learning of stochastic human driving policies using deep reinforcement learning
    Emuna, Ran
    Duffney, Rotem
    Borowsky, Avinoam
    Biess, Armin
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (23): : 16791 - 16804
  • [32] LFQ: Online Learning of Per-flow Queuing Policies using Deep Reinforcement Learning
    Bachl, Maximilian
    Fabini, Joachim
    Zseby, Tanja
    PROCEEDINGS OF THE 2020 IEEE 45TH CONFERENCE ON LOCAL COMPUTER NETWORKS (LCN 2020), 2020, : 417 - 420
  • [33] Example-guided learning of stochastic human driving policies using deep reinforcement learning
    Ran Emuna
    Rotem Duffney
    Avinoam Borowsky
    Armin Biess
    Neural Computing and Applications, 2023, 35 : 16791 - 16804
  • [34] Boosting Deep Reinforcement Learning Agents with Generative Data Augmentation
    Papagiannis, Tasos
    Alexandridis, Georgios
    Stafylopatis, Andreas
    APPLIED SCIENCES-BASEL, 2024, 14 (01):
  • [35] Reinforcement Learning with Deep Energy-Based Policies
    Haarnoja, Tuomas
    Tang, Haoran
    Abbeel, Pieter
    Levine, Sergey
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [36] Autoregressive Policies for Continuous Control Deep Reinforcement Learning
    Korenkevych, Dmytro
    Mahmood, A. Rupam
    Vasan, Gautham
    Bergstra, James
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2754 - 2762
  • [37] The State of Sparse Training in Deep Reinforcement Learning
    Graesser, Laura
    Evci, Utku
    Elsen, Erich
    Castro, Pablo Samuel
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [38] Counterfactual state explanations for reinforcement learning agents via generative deep learning
    Olson, Matthew L.
    Khanna, Roli
    Neal, Lawrence
    Li, Fuxin
    Wong, Weng-Keen
    ARTIFICIAL INTELLIGENCE, 2021, 295
  • [39] Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models
    Chua, Kurtland
    Calandra, Roberto
    McAllister, Rowan
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [40] Dual Control by Reinforcement Learning Using Deep Hyperstate Transition Models *
    Rosdahl, Christian
    Cervin, Anton
    Bernhardsson, Bo
    IFAC PAPERSONLINE, 2022, 55 (12): : 395 - 401