Improving Reinforcement Learning Pre-Training with Variational Dropout

被引:0
|
作者
Blau, Tom [1 ]
Ott, Lionel [1 ]
Ramos, Fabio [1 ]
机构
[1] Univ Sydney, Sch Informat Technol, Sydney, NSW, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning has been very successful at learning control policies for robotic agents in order to perform various tasks, such as driving around a track, navigating a maze, and bipedal locomotion. One significant drawback of reinforcement learning methods is that they require a large number of data points in order to learn good policies, a trait known as poor data efficiency or poor sample efficiency. One approach for improving sample efficiency is supervised pre-training of policies to directly clone the behavior of an expert, but this suffers from poor generalization far from the training data. We propose to improve this by using Gaussian dropout networks with a regularization term based on variational inference in the pre-training step. We show that this initializes policy parameters to significantly better values than standard supervised learning or random initialization, thus greatly reducing sample complexity compared with state-of-the-art methods, and enabling an RL algorithm to learn optimal policies for high-dimensional continuous control problems in a practical time frame.
引用
收藏
页码:4115 / 4122
页数:8
相关论文
共 50 条
  • [1] Pre-training Framework for Improving Learning Speed of Reinforcement Learning based Autonomous Vehicles
    Kim, Jung-Jae
    Cha, Si-Ho
    Ryu, Minwoo
    Jo, Minho
    [J]. 2019 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2019, : 321 - 322
  • [2] Supervised pre-training for improved stability in deep reinforcement learning
    Jang, Sooyoung
    Kim, Hyung-Il
    [J]. ICT EXPRESS, 2023, 9 (01): : 51 - 56
  • [3] RePreM: Representation Pre-training with Masked Model for Reinforcement Learning
    Cai, Yuanying
    Zhang, Chuheng
    Shen, Wei
    Zhang, Xuyun
    Ruan, Wenjie
    Huang, Longbo
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 6879 - 6887
  • [4] Improving Fractal Pre-training
    Anderson, Connor
    Farrell, Ryan
    [J]. 2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2412 - 2421
  • [5] Pre-training with asynchronous supervised learning for reinforcement learning based autonomous driving
    Wang, Yunpeng
    Zheng, Kunxian
    Tian, Daxin
    Duan, Xuting
    Zhou, Jianshan
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2021, 22 (05) : 673 - 686
  • [6] On the Effect of Pre-training for Transformer in Different Modality on Offline Reinforcement Learning
    Takagi, Shiro
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [7] Reinforcement Learning with Action-Free Pre-Training from Videos
    Seo, Younggyo
    Lee, Kimin
    James, Stephen
    Abbeel, Pieter
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 19561 - 19579
  • [8] Improving fault localization with pre-training
    Zhang, Zhuo
    Li, Ya
    Xue, Jianxin
    Mao, Xiaoguang
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (01)
  • [9] Improving fault localization with pre-training
    Zhuo Zhang
    Ya Li
    Jianxin Xue
    Xiaoguang Mao
    [J]. Frontiers of Computer Science, 2024, 18
  • [10] APD: Learning Diverse Behaviors for Reinforcement Learning Through Unsupervised Active Pre-Training
    Zeng, Kailin
    Zhang, QiYuan
    Chen, Bin
    Liang, Bin
    Yang, Jun
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) : 12251 - 12258