The Wisdom of the Crowd: Reliable Deep Reinforcement Learning Through Ensembles of Q-Functions

被引:8
|
作者
Elliott, Daniel L. [1 ]
Anderson, Charles [2 ]
机构
[1] Lindsay Corp, Omaha, NE 68802 USA
[2] Colorado State Univ, Dept Comp Sci, Ft Collins, CO 80523 USA
关键词
Training; Task analysis; Bagging; Stability criteria; Reinforcement learning; Neural networks; Computational modeling; Autonomous systems; machine learning algorithms; neural networks; NEURAL-NETWORKS;
D O I
10.1109/TNNLS.2021.3089425
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) agents learn by exploring the environment and then exploiting what they have learned. This frees the human trainers from having to know the preferred action or intrinsic value of each encountered state. The cost of this freedom is that RL is slower and more unstable than supervised learning. We explore the possibility that ensemble methods can remedy these shortcomings by investigating a novel technique which harnesses the wisdom of crowds by combining Q-function approximator estimates utilizing a simple combination scheme similar to the supervised learning approach known as bagging. Bagging approaches have not yet found widespread adoption in the RL literature nor has a comprehensive look at its performance been performed. Our results show that the proposed approach improves all three tasks and RL approaches attempted. The primary contribution of this work is a demonstration that the improvement is a direct result of the increased stability of the action portion of the state-action-value function. Subsequent experimentation demonstrates that the stability in learning allows an actor-critic method to find more efficient solutions. Finally we show that this approach can be used to decrease the amount of time necessary to solve problems which require a deep Q-learning (DQN) approach.
引用
收藏
页码:43 / 51
页数:9
相关论文
共 50 条
  • [21] Routing for Crowd Management in Smart Cities: A Deep Reinforcement Learning Perspective
    Zhao, Lei
    Wang, Jiadai
    Liu, Jiajia
    Kato, Nei
    IEEE COMMUNICATIONS MAGAZINE, 2019, 57 (04) : 88 - 93
  • [22] Crowd navigation in an unknown and complex environment based on deep reinforcement learning
    Sun, Libo
    Qu, Yuke
    Qin, Wenhu
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (3-4)
  • [23] Risk-Aware Deep Reinforcement Learning for Robot Crowd Navigation
    Sun, Xueying
    Zhang, Qiang
    Wei, Yifei
    Liu, Mingmin
    ELECTRONICS, 2023, 12 (23)
  • [24] GREIL-Crowds: Crowd Simulation with Deep Reinforcement Learning and Examples
    Charalambous, Panayiotis
    Pettre, Julien
    Vassiliades, Vassilis
    Chrysanthou, Yiorgos
    Pelechano, Nuria
    ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (04):
  • [25] Robot navigation in a crowd by integrating deep reinforcement learning and online planning
    Zhou, Zhiqian
    Zhu, Pengming
    Zeng, Zhiwen
    Xiao, Junhao
    Lu, Huimin
    Zhou, Zongtan
    APPLIED INTELLIGENCE, 2022, 52 (13) : 15600 - 15616
  • [26] Online weighted Q-ensembles for reduced hyperparameter tuning in reinforcement learning
    Garcia R.
    Caarls W.
    Soft Computing, 2024, 28 (13-14) : 8549 - 8559
  • [27] Learning Mobile Manipulation through Deep Reinforcement Learning
    Wang, Cong
    Zhang, Qifeng
    Tian, Qiyan
    Li, Shuo
    Wang, Xiaohui
    Lane, David
    Petillot, Yvan
    Wang, Sen
    SENSORS, 2020, 20 (03)
  • [28] Adaptive deep Q learning network with reinforcement learning for crime prediction
    J. Vimala Devi
    K. S. Kavitha
    Evolutionary Intelligence, 2023, 16 : 685 - 696
  • [29] Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach
    Xu, Zhi-xiong
    Cao, Lei
    Chen, Xi-liang
    Li, Chen-xi
    Zhang, Yong-liang
    Lai, Jun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (09) : 2315 - 2322
  • [30] Adaptive deep Q learning network with reinforcement learning for crime prediction
    Devi, J. Vimala
    Kavitha, K. S.
    EVOLUTIONARY INTELLIGENCE, 2023, 16 (02) : 685 - 696