The Wisdom of the Crowd: Reliable Deep Reinforcement Learning Through Ensembles of Q-Functions

被引:8
|
作者
Elliott, Daniel L. [1 ]
Anderson, Charles [2 ]
机构
[1] Lindsay Corp, Omaha, NE 68802 USA
[2] Colorado State Univ, Dept Comp Sci, Ft Collins, CO 80523 USA
关键词
Training; Task analysis; Bagging; Stability criteria; Reinforcement learning; Neural networks; Computational modeling; Autonomous systems; machine learning algorithms; neural networks; NEURAL-NETWORKS;
D O I
10.1109/TNNLS.2021.3089425
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) agents learn by exploring the environment and then exploiting what they have learned. This frees the human trainers from having to know the preferred action or intrinsic value of each encountered state. The cost of this freedom is that RL is slower and more unstable than supervised learning. We explore the possibility that ensemble methods can remedy these shortcomings by investigating a novel technique which harnesses the wisdom of crowds by combining Q-function approximator estimates utilizing a simple combination scheme similar to the supervised learning approach known as bagging. Bagging approaches have not yet found widespread adoption in the RL literature nor has a comprehensive look at its performance been performed. Our results show that the proposed approach improves all three tasks and RL approaches attempted. The primary contribution of this work is a demonstration that the improvement is a direct result of the increased stability of the action portion of the state-action-value function. Subsequent experimentation demonstrates that the stability in learning allows an actor-critic method to find more efficient solutions. Finally we show that this approach can be used to decrease the amount of time necessary to solve problems which require a deep Q-learning (DQN) approach.
引用
收藏
页码:43 / 51
页数:9
相关论文
共 50 条
  • [41] Learning heuristics for weighted CSPs through deep reinforcement learning
    Dingding Chen
    Ziyu Chen
    Zhongshi He
    Junsong Gao
    Zhizhuo Su
    Applied Intelligence, 2023, 53 : 8844 - 8863
  • [42] Mixing Update Q-value for Deep Reinforcement Learning
    Li, Zhunan
    Hou, Xinwen
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [43] Deep Reinforcement Learning. Case Study: Deep Q-Network
    Vrejoiu, Mihnea Horia
    ROMANIAN JOURNAL OF INFORMATION TECHNOLOGY AND AUTOMATIC CONTROL-REVISTA ROMANA DE INFORMATICA SI AUTOMATICA, 2019, 29 (03): : 65 - 78
  • [44] Historical Best Q-Networks for Deep Reinforcement Learning
    Yu, Wenwu
    Wang, Rui
    Li, Ruiying
    Gao, Jing
    Hu, Xiaohui
    2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2018, : 6 - 11
  • [45] Deep Reinforcement Learning Pairs Trading with a Double Deep Q-Network
    Brim, Andrew
    2020 10TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2020, : 222 - 227
  • [46] Characterizing Crowd Preferences on Stadium Facilities through Dynamic Inverse Reinforcement Learning
    Dong, Yiwen
    Huang, Peide
    Noh, Hae Young
    PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILDINGS, CITIES, AND TRANSPORTATION, BUILDSYS 2023, 2023, : 305 - 306
  • [47] A reinforcement learning approach for reducing traffic congestion using deep Q learning
    S M Masfequier Rahman Swapno
    SM Nuruzzaman Nobel
    Preeti Meena
    V. P. Meena
    Ahmad Taher Azar
    Zeeshan Haider
    Mohamed Tounsi
    Scientific Reports, 14 (1)
  • [48] Enhanced Machine Learning Algorithms: Deep Learning, Reinforcement Learning, ana Q-Learning
    Park, Ji Su
    Park, Jong Hyuk
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2020, 16 (05): : 1001 - 1007
  • [49] A double-layer crowd evacuation simulation method based on deep reinforcement learning
    Zhang, Yong
    Yang, Bo
    Zhu, Jianlin
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2024, 35 (03)
  • [50] A History-based Framework for Online Continuous Action Ensembles in Deep Reinforcement Learning
    Oliveira, Renata Garcia
    Caarls, Wouter
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 580 - 588