The Wisdom of the Crowd: Reliable Deep Reinforcement Learning Through Ensembles of Q-Functions

被引：8

作者：

Elliott, Daniel L. ^{[1
]}

Anderson, Charles ^{[2
]}

机构：

[1] Lindsay Corp, Omaha, NE 68802 USA

[2] Colorado State Univ, Dept Comp Sci, Ft Collins, CO 80523 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年 / 34卷 / 01期

关键词：

Training; Task analysis; Bagging; Stability criteria; Reinforcement learning; Neural networks; Computational modeling; Autonomous systems; machine learning algorithms; neural networks; NEURAL-NETWORKS;

D O I：

10.1109/TNNLS.2021.3089425

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning (RL) agents learn by exploring the environment and then exploiting what they have learned. This frees the human trainers from having to know the preferred action or intrinsic value of each encountered state. The cost of this freedom is that RL is slower and more unstable than supervised learning. We explore the possibility that ensemble methods can remedy these shortcomings by investigating a novel technique which harnesses the wisdom of crowds by combining Q-function approximator estimates utilizing a simple combination scheme similar to the supervised learning approach known as bagging. Bagging approaches have not yet found widespread adoption in the RL literature nor has a comprehensive look at its performance been performed. Our results show that the proposed approach improves all three tasks and RL approaches attempted. The primary contribution of this work is a demonstration that the improvement is a direct result of the increased stability of the action portion of the state-action-value function. Subsequent experimentation demonstrates that the stability in learning allows an actor-critic method to find more efficient solutions. Finally we show that this approach can be used to decrease the amount of time necessary to solve problems which require a deep Q-learning (DQN) approach.

引用

页码：43 / 51

页数：9

共 50 条

[1] DROPOUT Q-FUNCTIONS for DOUBLY EFFICIENT REINFORCEMENT LEARNING
Hiraoka, Takuya
Imagawa, Takahisa
Hashimoto, Taisei
Onishi, Takashi
Tsuruoka, Yoshimasa
arXiv, 2021,
[2] Reinforcement Learning for Synchronization of Heterogeneous Multiagent Systems by Improved Q-Functions
Li, Jinna
Yuan, Lin
Cheng, Weiran
Chai, Tianyou
Lewis, Frank L.
IEEE TRANSACTIONS ON CYBERNETICS, 2024, : 6545 - 6558
[3] Offline Reinforcement Learning via Policy Regularization and Ensemble Q-Functions
Wang, Tao
Xie, Shaorong
Gao, Mingke
Chen, Xue
Zhang, Zhenyu
Yu, Hang
2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 1167 - 1174
[4] Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
Chebotar, Yevgen
Vuong, Quan
Irpan, Alex
Hausman, Karol
Xia, Fei
Lu, Yao
Kumar, Aviral
Yu, Tianhe
Herzog, Alexander
Pertsch, Karl
Gopalakrishnan, Keerthana
Ibarz, Julian
Nachum, Ofir
Sontakke, Sumedh
Salazar, Grecia
Tran, Huong T.
Peralta, Jodilyn
Tan, Clayton
Manjunath, Deeksha
Singht, Jaspiar
Zitkovich, Brianna
Jackson, Tomas
Rao, Kanishka
Finn, Chelsea
Levine, Sergey
CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
[5] Crowd Simulation by Deep Reinforcement Learning
Lee, Jaedong
Won, Jungdam
Lee, Jehee
ACM SIGGRAPH CONFERENCE ON MOTION, INTERACTION, AND GAMES (MIG 2018), 2018,
[6] Learning continuous Q-functions using generalized Benders cuts
Warrington, Joseph
2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), 2019, : 530 - 535
[7] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Tan, Fuxiao
Yan, Pengfei
Guan, Xinping
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
[8] Multi-Agent Exploration for Faster and Reliable Deep Q-Learning Convergence in Reinforcement Learning
Majumdar, Abhijit
Benavidez, Patrick
Jamshidi, Mo
2018 WORLD AUTOMATION CONGRESS (WAC), 2018, : 222 - 227
[9] Gradient boosting in crowd ensembles for Q-learning using weight sharing
D. L. Elliott
K. C. Santosh
Charles Anderson
International Journal of Machine Learning and Cybernetics, 2020, 11 : 2275 - 2287
[10] Gradient boosting in crowd ensembles for Q-learning using weight sharing
Elliott, D. L.
Santosh, K. C.
Anderson, Charles
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2020, 11 (10) : 2275 - 2287

← 1 2 3 4 5 →