The Wisdom of the Crowd: Reliable Deep Reinforcement Learning Through Ensembles of Q-Functions

被引:8
|
作者
Elliott, Daniel L. [1 ]
Anderson, Charles [2 ]
机构
[1] Lindsay Corp, Omaha, NE 68802 USA
[2] Colorado State Univ, Dept Comp Sci, Ft Collins, CO 80523 USA
关键词
Training; Task analysis; Bagging; Stability criteria; Reinforcement learning; Neural networks; Computational modeling; Autonomous systems; machine learning algorithms; neural networks; NEURAL-NETWORKS;
D O I
10.1109/TNNLS.2021.3089425
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) agents learn by exploring the environment and then exploiting what they have learned. This frees the human trainers from having to know the preferred action or intrinsic value of each encountered state. The cost of this freedom is that RL is slower and more unstable than supervised learning. We explore the possibility that ensemble methods can remedy these shortcomings by investigating a novel technique which harnesses the wisdom of crowds by combining Q-function approximator estimates utilizing a simple combination scheme similar to the supervised learning approach known as bagging. Bagging approaches have not yet found widespread adoption in the RL literature nor has a comprehensive look at its performance been performed. Our results show that the proposed approach improves all three tasks and RL approaches attempted. The primary contribution of this work is a demonstration that the improvement is a direct result of the increased stability of the action portion of the state-action-value function. Subsequent experimentation demonstrates that the stability in learning allows an actor-critic method to find more efficient solutions. Finally we show that this approach can be used to decrease the amount of time necessary to solve problems which require a deep Q-learning (DQN) approach.
引用
收藏
页码:43 / 51
页数:9
相关论文
共 50 条
  • [1] DROPOUT Q-FUNCTIONS for DOUBLY EFFICIENT REINFORCEMENT LEARNING
    Hiraoka, Takuya
    Imagawa, Takahisa
    Hashimoto, Taisei
    Onishi, Takashi
    Tsuruoka, Yoshimasa
    arXiv, 2021,
  • [2] Reinforcement Learning for Synchronization of Heterogeneous Multiagent Systems by Improved Q-Functions
    Li, Jinna
    Yuan, Lin
    Cheng, Weiran
    Chai, Tianyou
    Lewis, Frank L.
    IEEE TRANSACTIONS ON CYBERNETICS, 2024, : 6545 - 6558
  • [3] Offline Reinforcement Learning via Policy Regularization and Ensemble Q-Functions
    Wang, Tao
    Xie, Shaorong
    Gao, Mingke
    Chen, Xue
    Zhang, Zhenyu
    Yu, Hang
    2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 1167 - 1174
  • [4] Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
    Chebotar, Yevgen
    Vuong, Quan
    Irpan, Alex
    Hausman, Karol
    Xia, Fei
    Lu, Yao
    Kumar, Aviral
    Yu, Tianhe
    Herzog, Alexander
    Pertsch, Karl
    Gopalakrishnan, Keerthana
    Ibarz, Julian
    Nachum, Ofir
    Sontakke, Sumedh
    Salazar, Grecia
    Tran, Huong T.
    Peralta, Jodilyn
    Tan, Clayton
    Manjunath, Deeksha
    Singht, Jaspiar
    Zitkovich, Brianna
    Jackson, Tomas
    Rao, Kanishka
    Finn, Chelsea
    Levine, Sergey
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [5] Crowd Simulation by Deep Reinforcement Learning
    Lee, Jaedong
    Won, Jungdam
    Lee, Jehee
    ACM SIGGRAPH CONFERENCE ON MOTION, INTERACTION, AND GAMES (MIG 2018), 2018,
  • [6] Learning continuous Q-functions using generalized Benders cuts
    Warrington, Joseph
    2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), 2019, : 530 - 535
  • [7] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [8] Multi-Agent Exploration for Faster and Reliable Deep Q-Learning Convergence in Reinforcement Learning
    Majumdar, Abhijit
    Benavidez, Patrick
    Jamshidi, Mo
    2018 WORLD AUTOMATION CONGRESS (WAC), 2018, : 222 - 227
  • [9] Gradient boosting in crowd ensembles for Q-learning using weight sharing
    D. L. Elliott
    K. C. Santosh
    Charles Anderson
    International Journal of Machine Learning and Cybernetics, 2020, 11 : 2275 - 2287
  • [10] Gradient boosting in crowd ensembles for Q-learning using weight sharing
    Elliott, D. L.
    Santosh, K. C.
    Anderson, Charles
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2020, 11 (10) : 2275 - 2287