Enhanced Experience Prioritization: A Novel Upper Confidence Bound Approach

被引:1
|
作者
Kovari, Balint [1 ]
Pelenczei, Balint [2 ]
Becsi, Tamas [1 ]
机构
[1] Budapest Univ Technol & Econ, Fac Transportat Engn & Vehicle Engn, Dept Control Transportat & Vehicle Syst, H-1111 Budapest, Hungary
[2] Inst Comp Sci & Control, Syst & Control Lab, H-1111 Budapest, Hungary
关键词
Deep learning; experience prioritization; experience replay; machine learning; Q-learning; reinforcement learning; sampling;
D O I
10.1109/ACCESS.2023.3339248
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Value-based Reinforcement Learning algorithms achieve superior performance by utilizing experiences gathered in the past to update their so-called value-function. In most cases, it is accomplished by applying a sampling strategy to an experience buffer, in which state transitions are stored during the training process. However, the design of such methods is not so intuitive. General theoretic approaches tend to determine the expected learning progress from each experience, based on which the update of neural networks can be carried out efficiently. Proper choice of these methods can not only accelerate, but also stabilize the training significantly by increasing sampling efficiency, which indirectly leads to a reduction in time and computing capacity requirements. As one of the most critical aspects of using Machine Learning (ML) based techniques originates from the lack of decent computing power, thus endeavour to find optimal solutions has long been a researched topic in the field of Reinforcement Learning. Therefore the main focus of this research has been to develop an experience prioritization method acquiring competitive performance, besides having the overall cost of training considerably lowered. In this paper, we propose a novel priority value assignment concept for experience prioritization in Reinforcement Learning, based on the Upper Confidence Bound algorithm. Furthermore, we present empirical findings of our solution, that it outperforms current state-of-the-art in terms of sampling efficiency, while enabling faster and more cost-efficient training processes.
引用
收藏
页码:138488 / 138501
页数:14
相关论文
共 50 条
  • [31] Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem
    Zoghi, Masrour
    Whiteson, Shimon
    Munos, Remi
    de Rijke, Maarten
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 10 - 18
  • [32] PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
    Zhang, Qingru
    Zuo, Simiao
    Liang, Chen
    Bukharin, Alexander
    He, Pengcheng
    Chen, Weizhu
    Zhao, Tuo
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [33] Upper confidence bound based decision making strategies and dynamic spectrum access
    Jouini, Wassim
    Ernst, Damien
    Moy, Christophe
    Palicot, Jacques
    2010 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS - ICC 2010, 2010,
  • [34] Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses
    Jin, Rong
    Simchi-Levi, David
    Wang, Li
    Wang, Xinshang
    Yang, Sen
    MANAGEMENT SCIENCE, 2021, 67 (08) : 4756 - 4771
  • [35] Void growth models by upper bound approach
    Pan, KL
    Fang, J
    3RD INTERNATIONAL CONFERENCE ON NONLINEAR MECHANICS, 1998, : 327 - 330
  • [36] Upper bound approach to hydrodynamic drawing of wires
    Baranski, Krzysztof
    Brozda, Wieslaw
    Prajsnar, Tadeusz
    Szulc, Wojciech
    Wusatowski, Roman
    Archiwum Hutnictwa, 1988, 33 (03): : 455 - 483
  • [37] Upper Confidence Bound (UCB) Algorithms for Adaptive Operator Selection in MOEA/D
    Goncalves, Richard A.
    Almeida, Carolina P.
    Pozo, Aurora
    EVOLUTIONARY MULTI-CRITERION OPTIMIZATION, PT I, 2015, 9018 : 411 - 425
  • [38] ANALYSIS OF STRIP ROLLING BY THE UPPER BOUND APPROACH
    AVITZUR, B
    GORDON, W
    TALBERT, S
    JOURNAL OF ENGINEERING FOR INDUSTRY-TRANSACTIONS OF THE ASME, 1987, 109 (04): : 338 - 346
  • [39] THE UPPER BOUND APPROACH TO THE FRICTION WAVE MODEL
    AVITZUR, B
    JOURNAL OF MATERIALS PROCESSING TECHNOLOGY, 1992, 34 (1-4) : 201 - 209
  • [40] An improved approach for constructing lower confidence bound on process yield
    Wu, Chien-Wei
    Liao, Mou-Yuan
    Chen, James C.
    EUROPEAN JOURNAL OF INDUSTRIAL ENGINEERING, 2012, 6 (03) : 369 - 390