The Wisdom of the Crowd: Reliable Deep Reinforcement Learning Through Ensembles of Q-Functions

被引：8

作者：

Elliott, Daniel L. ^{[1
]}

Anderson, Charles ^{[2
]}

机构：

[1] Lindsay Corp, Omaha, NE 68802 USA

[2] Colorado State Univ, Dept Comp Sci, Ft Collins, CO 80523 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年 / 34卷 / 01期

关键词：

Training; Task analysis; Bagging; Stability criteria; Reinforcement learning; Neural networks; Computational modeling; Autonomous systems; machine learning algorithms; neural networks; NEURAL-NETWORKS;

D O I：

10.1109/TNNLS.2021.3089425

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning (RL) agents learn by exploring the environment and then exploiting what they have learned. This frees the human trainers from having to know the preferred action or intrinsic value of each encountered state. The cost of this freedom is that RL is slower and more unstable than supervised learning. We explore the possibility that ensemble methods can remedy these shortcomings by investigating a novel technique which harnesses the wisdom of crowds by combining Q-function approximator estimates utilizing a simple combination scheme similar to the supervised learning approach known as bagging. Bagging approaches have not yet found widespread adoption in the RL literature nor has a comprehensive look at its performance been performed. Our results show that the proposed approach improves all three tasks and RL approaches attempted. The primary contribution of this work is a demonstration that the improvement is a direct result of the increased stability of the action portion of the state-action-value function. Subsequent experimentation demonstrates that the stability in learning allows an actor-critic method to find more efficient solutions. Finally we show that this approach can be used to decrease the amount of time necessary to solve problems which require a deep Q-learning (DQN) approach.

引用

页码：43 / 51

页数：9

共 50 条

[41] Learning heuristics for weighted CSPs through deep reinforcement learning
Dingding Chen
Ziyu Chen
Zhongshi He
Junsong Gao
Zhizhuo Su
Applied Intelligence, 2023, 53 : 8844 - 8863
[42] Mixing Update Q-value for Deep Reinforcement Learning
Li, Zhunan
Hou, Xinwen
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[43] Deep Reinforcement Learning. Case Study: Deep Q-Network
Vrejoiu, Mihnea Horia
ROMANIAN JOURNAL OF INFORMATION TECHNOLOGY AND AUTOMATIC CONTROL-REVISTA ROMANA DE INFORMATICA SI AUTOMATICA, 2019, 29 (03): : 65 - 78
[44] Historical Best Q-Networks for Deep Reinforcement Learning
Yu, Wenwu
Wang, Rui
Li, Ruiying
Gao, Jing
Hu, Xiaohui
2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2018, : 6 - 11
[45] Deep Reinforcement Learning Pairs Trading with a Double Deep Q-Network
Brim, Andrew
2020 10TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2020, : 222 - 227
[46] Characterizing Crowd Preferences on Stadium Facilities through Dynamic Inverse Reinforcement Learning
Dong, Yiwen
Huang, Peide
Noh, Hae Young
PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON SYSTEMS FOR ENERGY-EFFICIENT BUILDINGS, CITIES, AND TRANSPORTATION, BUILDSYS 2023, 2023, : 305 - 306
[47] A reinforcement learning approach for reducing traffic congestion using deep Q learning
S M Masfequier Rahman Swapno
SM Nuruzzaman Nobel
Preeti Meena
V. P. Meena
Ahmad Taher Azar
Zeeshan Haider
Mohamed Tounsi
Scientific Reports, 14 (1)
[48] Enhanced Machine Learning Algorithms: Deep Learning, Reinforcement Learning, ana Q-Learning
Park, Ji Su
Park, Jong Hyuk
JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2020, 16 (05): : 1001 - 1007
[49] A double-layer crowd evacuation simulation method based on deep reinforcement learning
Zhang, Yong
Yang, Bo
Zhu, Jianlin
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2024, 35 (03)
[50] A History-based Framework for Online Continuous Action Ensembles in Deep Reinforcement Learning
Oliveira, Renata Garcia
Caarls, Wouter
ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 580 - 588

← 1 2 3 4 5 →