Enhanced Experience Prioritization: A Novel Upper Confidence Bound Approach

被引：1

作者：

Kovari, Balint ^{[1
]}

Pelenczei, Balint ^{[2
]}

Becsi, Tamas ^{[1
]}

机构：

[1] Budapest Univ Technol & Econ, Fac Transportat Engn & Vehicle Engn, Dept Control Transportat & Vehicle Syst, H-1111 Budapest, Hungary

[2] Inst Comp Sci & Control, Syst & Control Lab, H-1111 Budapest, Hungary

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Deep learning; experience prioritization; experience replay; machine learning; Q-learning; reinforcement learning; sampling;

D O I：

10.1109/ACCESS.2023.3339248

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Value-based Reinforcement Learning algorithms achieve superior performance by utilizing experiences gathered in the past to update their so-called value-function. In most cases, it is accomplished by applying a sampling strategy to an experience buffer, in which state transitions are stored during the training process. However, the design of such methods is not so intuitive. General theoretic approaches tend to determine the expected learning progress from each experience, based on which the update of neural networks can be carried out efficiently. Proper choice of these methods can not only accelerate, but also stabilize the training significantly by increasing sampling efficiency, which indirectly leads to a reduction in time and computing capacity requirements. As one of the most critical aspects of using Machine Learning (ML) based techniques originates from the lack of decent computing power, thus endeavour to find optimal solutions has long been a researched topic in the field of Reinforcement Learning. Therefore the main focus of this research has been to develop an experience prioritization method acquiring competitive performance, besides having the overall cost of training considerably lowered. In this paper, we propose a novel priority value assignment concept for experience prioritization in Reinforcement Learning, based on the Upper Confidence Bound algorithm. Furthermore, we present empirical findings of our solution, that it outperforms current state-of-the-art in terms of sampling efficiency, while enabling faster and more cost-efficient training processes.

引用

页码：138488 / 138501

页数：14

共 50 条

[41] Confidence regions of stochastic variational inequalities: error bound approach
Liu, Yongchao
Zhang, Jin
OPTIMIZATION, 2022, 71 (07) : 2157 - 2184
[42] Novel approaches to confidence bound generation for neural network representations
Shao, R
Zhang, J
Martin, EB
Morris, AJ
FIFTH INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS, 1997, (440): : 76 - 81
[43] EXPERIENCE WITH A NOVEL-APPROACH TO UPPER-POLE NEPHRECTOMY AND PARTIAL URETERECTOMY
MESROBIAN, HGJ
PEDIATRIC SURGERY INTERNATIONAL, 1994, 9 (1-2) : 150 - 152
[44] A new upper bound for Shannon entropy. A novel approach in modeling of Big Data applications
Popescu, Pantelimon George
Slusanschi, Emil-Ioan
Iancu, Voichita
Pop, Florin
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (02): : 351 - 359
[45] Improvement projects with an environmental focus: A novel approach for prioritization
Raja, Ambika M.
Raju, Ramasamy
Raju, Rajkanth
Raja, Sanjeev S.
QUALITY ENGINEERING, 2023, 35 (02) : 1 - 14
[46] An Upper Confidence Bound for Simultaneous Exploration and Exploitation in Heterogeneous Multi-Robot Systems
Lee, Ki Myung Brian
Kong, Felix
Cannizzaro, Ricardo
Palmer, Jennifer L.
Johnson, David
Yoo, Chanyeol
Fitch, Robert
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 8685 - 8691
[47] Computer Adaptive Testing Using Upper-Confidence Bound Algorithm for Formative Assessment
Melesko, Jaroslav
Novickij, Vitalij
APPLIED SCIENCES-BASEL, 2019, 9 (20):
[48] Identification of Top-K Influencers Based on Upper Confidence Bound and Local Structure
Alshahrani, Mohammed
Zhu, Fuxi
Mekouar, Soufiana
Alghamdi, Mohammed Yahya
Liu, Shichao
BIG DATA RESEARCH, 2021, 25
[49] Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits
Carpentier, Alexandra
Lazaric, Alessandro
Ghavamzadeh, Mohammad
Munos, Remi
Auer, Peter
ALGORITHMIC LEARNING THEORY, 2011, 6925 : 189 - +
[50] UPPER AND LOWER-BOUND DISTRIBUTIONS THAT GIVE SIMULTANEOUS CONFIDENCE-INTERVALS FOR QUANTILES
SATTEN, GA
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1995, 90 (430) : 747 - 752

← 1 2 3 4 5 →