Enhanced Experience Prioritization: A Novel Upper Confidence Bound Approach

被引:1
|
作者
Kovari, Balint [1 ]
Pelenczei, Balint [2 ]
Becsi, Tamas [1 ]
机构
[1] Budapest Univ Technol & Econ, Fac Transportat Engn & Vehicle Engn, Dept Control Transportat & Vehicle Syst, H-1111 Budapest, Hungary
[2] Inst Comp Sci & Control, Syst & Control Lab, H-1111 Budapest, Hungary
关键词
Deep learning; experience prioritization; experience replay; machine learning; Q-learning; reinforcement learning; sampling;
D O I
10.1109/ACCESS.2023.3339248
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Value-based Reinforcement Learning algorithms achieve superior performance by utilizing experiences gathered in the past to update their so-called value-function. In most cases, it is accomplished by applying a sampling strategy to an experience buffer, in which state transitions are stored during the training process. However, the design of such methods is not so intuitive. General theoretic approaches tend to determine the expected learning progress from each experience, based on which the update of neural networks can be carried out efficiently. Proper choice of these methods can not only accelerate, but also stabilize the training significantly by increasing sampling efficiency, which indirectly leads to a reduction in time and computing capacity requirements. As one of the most critical aspects of using Machine Learning (ML) based techniques originates from the lack of decent computing power, thus endeavour to find optimal solutions has long been a researched topic in the field of Reinforcement Learning. Therefore the main focus of this research has been to develop an experience prioritization method acquiring competitive performance, besides having the overall cost of training considerably lowered. In this paper, we propose a novel priority value assignment concept for experience prioritization in Reinforcement Learning, based on the Upper Confidence Bound algorithm. Furthermore, we present empirical findings of our solution, that it outperforms current state-of-the-art in terms of sampling efficiency, while enabling faster and more cost-efficient training processes.
引用
收藏
页码:138488 / 138501
页数:14
相关论文
共 50 条
  • [1] Upper Confidence Bound Learning Approach for Real HF Measurements
    Melian-Gutierrez, Laura
    Modi, Navikkumar
    Moy, Christophe
    Perez-Alvarez, Ivan
    Bader, Faouzi
    Zazo, Santiago
    2015 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION WORKSHOP (ICCW), 2015, : 381 - 386
  • [2] AN UPPER CONFIDENCE BOUND APPROACH TO ESTIMATING COHERENT RISK MEASURES
    Liu, Guangwu
    Shi, Wen
    Zhang, Kun
    2019 WINTER SIMULATION CONFERENCE (WSC), 2019, : 914 - 925
  • [3] A Priority Experience Replay Sampling Method Based on Upper Confidence Bound
    Ke, Fengkai
    Zhao, Daxing
    Sun, Guodong
    Feng, Wei
    ICDLT 2019: 2019 3RD INTERNATIONAL CONFERENCE ON DEEP LEARNING TECHNOLOGIES, 2019, : 38 - 41
  • [4] Bootstrapping Upper Confidence Bound
    Hao, Botao
    Abbasi-Yadkori, Yasin
    Wen, Zheng
    Cheng, Guang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [5] DUCT: An Upper Confidence Bound Approach to Distributed Constraint Optimization Problems
    Ottens, Brammert
    Dimitrakakis, Christos
    Faltings, Boi
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2017, 8 (05)
  • [6] Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees
    Baheri, Ali
    MATHEMATICS, 2025, 13 (01)
  • [7] Imitation Upper Confidence Bound for Bandits on a Graph
    Lupu, Andrei
    Precup, Doina
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 8113 - 8114
  • [8] Scalarized Lower Upper Confidence Bound Algorithm
    Drugan, Madalina M.
    LEARNING AND INTELLIGENT OPTIMIZATION, LION 9, 2015, 8994 : 229 - 235
  • [9] An informative path planning approach for mobile robots based on upper confidence bound algorithm
    Wang Y.-Q.
    Wu Z.-L.
    Li Q.-Z.
    Kongzhi yu Juece/Control and Decision, 2023, 38 (02): : 395 - 402
  • [10] Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound Framework
    Huang, Ziyi
    Lam, Henry
    Meisami, Amirhossein
    Zhang, Haofeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,