The Wisdom of the Crowd: Reliable Deep Reinforcement Learning Through Ensembles of Q-Functions

被引：8

作者：

Elliott, Daniel L. ^{[1
]}

Anderson, Charles ^{[2
]}

机构：

[1] Lindsay Corp, Omaha, NE 68802 USA

[2] Colorado State Univ, Dept Comp Sci, Ft Collins, CO 80523 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年 / 34卷 / 01期

关键词：

Training; Task analysis; Bagging; Stability criteria; Reinforcement learning; Neural networks; Computational modeling; Autonomous systems; machine learning algorithms; neural networks; NEURAL-NETWORKS;

D O I：

10.1109/TNNLS.2021.3089425

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning (RL) agents learn by exploring the environment and then exploiting what they have learned. This frees the human trainers from having to know the preferred action or intrinsic value of each encountered state. The cost of this freedom is that RL is slower and more unstable than supervised learning. We explore the possibility that ensemble methods can remedy these shortcomings by investigating a novel technique which harnesses the wisdom of crowds by combining Q-function approximator estimates utilizing a simple combination scheme similar to the supervised learning approach known as bagging. Bagging approaches have not yet found widespread adoption in the RL literature nor has a comprehensive look at its performance been performed. Our results show that the proposed approach improves all three tasks and RL approaches attempted. The primary contribution of this work is a demonstration that the improvement is a direct result of the increased stability of the action portion of the state-action-value function. Subsequent experimentation demonstrates that the stability in learning allows an actor-critic method to find more efficient solutions. Finally we show that this approach can be used to decrease the amount of time necessary to solve problems which require a deep Q-learning (DQN) approach.

引用

页码：43 / 51

页数：9

共 50 条

[21] Routing for Crowd Management in Smart Cities: A Deep Reinforcement Learning Perspective
Zhao, Lei
Wang, Jiadai
Liu, Jiajia
Kato, Nei
IEEE COMMUNICATIONS MAGAZINE, 2019, 57 (04) : 88 - 93
[22] Crowd navigation in an unknown and complex environment based on deep reinforcement learning
Sun, Libo
Qu, Yuke
Qin, Wenhu
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (3-4)
[23] Risk-Aware Deep Reinforcement Learning for Robot Crowd Navigation
Sun, Xueying
Zhang, Qiang
Wei, Yifei
Liu, Mingmin
ELECTRONICS, 2023, 12 (23)
[24] GREIL-Crowds: Crowd Simulation with Deep Reinforcement Learning and Examples
Charalambous, Panayiotis
Pettre, Julien
Vassiliades, Vassilis
Chrysanthou, Yiorgos
Pelechano, Nuria
ACM TRANSACTIONS ON GRAPHICS, 2023, 42 (04):
[25] Robot navigation in a crowd by integrating deep reinforcement learning and online planning
Zhou, Zhiqian
Zhu, Pengming
Zeng, Zhiwen
Xiao, Junhao
Lu, Huimin
Zhou, Zongtan
APPLIED INTELLIGENCE, 2022, 52 (13) : 15600 - 15616
[26] Online weighted Q-ensembles for reduced hyperparameter tuning in reinforcement learning
Garcia R.
Caarls W.
Soft Computing, 2024, 28 (13-14) : 8549 - 8559
[27] Learning Mobile Manipulation through Deep Reinforcement Learning
Wang, Cong
Zhang, Qifeng
Tian, Qiyan
Li, Shuo
Wang, Xiaohui
Lane, David
Petillot, Yvan
Wang, Sen
SENSORS, 2020, 20 (03)
[28] Adaptive deep Q learning network with reinforcement learning for crime prediction
J. Vimala Devi
K. S. Kavitha
Evolutionary Intelligence, 2023, 16 : 685 - 696
[29] Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach
Xu, Zhi-xiong
Cao, Lei
Chen, Xi-liang
Li, Chen-xi
Zhang, Yong-liang
Lai, Jun
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (09) : 2315 - 2322
[30] Adaptive deep Q learning network with reinforcement learning for crime prediction
Devi, J. Vimala
Kavitha, K. S.
EVOLUTIONARY INTELLIGENCE, 2023, 16 (02) : 685 - 696

← 1 2 3 4 5 →