Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA

被引：35

作者：

Da Silva, Lucileide M. D. ^{[1
]}

Torquato, Matheus F. ^{[2
]}

Fernandes, Marcelo A. C. ^{[3
]}

机构：

[1] Fed Inst Rio Grande do Norte, Dept Comp Sci & Technol, BR-59200000 Santa Cruz, Brazil

[2] Swansea Univ, Coll Engn, Swansea SA2 8PP, W Glam, Wales

[3] Univ Fed Rio Grande do Norte, Dept Comp Engn & Automat, BR-59078970 Natal, RN, Brazil

来源：

IEEE ACCESS | 2019年 / 7卷

关键词：

FPGA; Q-learning; reinforcement learning; reconfigurable computing; HARDWARE; ARCHITECTURE; NETWORK;

D O I：

10.1109/ACCESS.2018.2885950

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Q-learning is an off-policy reinforcement learning technique, which has the main advantage of obtaining an optimal policy interacting with an unknown model environment. This paper proposes a parallel fixed-point Q-learning algorithm architecture implemented on field programmable gate arrays (FPGA) focusing on optimizing the system processing time. The convergence results are presented, and the processing time and occupied area were analyzed for different states and actions sizes scenarios and various fixed-point formats. The studies concerning the accuracy of the Q-learning technique response and resolution error associated with a decrease in the number of bits were also carried out for hardware implementation. The architecture implementation details were featured. The entire project was developed using the system generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA.

引用

页码：2782 / 2798

页数：17

共 50 条

[1] An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
Spano, Sergio
Cardarilli, Gian Carlo
Di Nunzio, Luca
Fazzolari, Rocco
Giardino, Daniele
Matta, Marco
Nannarelli, Alberto
Re, Marco
[J]. IEEE ACCESS, 2019, 7 : 186340 - 186351
[2] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Tan, Fuxiao
Yan, Pengfei
Guan, Xinping
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
[3] Fuzzy Q-Learning for generalization of reinforcement learning
Berenji, HR
[J]. FUZZ-IEEE '96 - PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, 1996, : 2208 - 2214
[4] Deep Reinforcement Learning with Double Q-Learning
van Hasselt, Hado
Guez, Arthur
Silver, David
[J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2094 - 2100
[5] Reinforcement learning guidance law of Q-learning
Zhang, Qinhao
Ao, Baiqiang
Zhang, Qinxue
[J]. Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2020, 42 (02): : 414 - 419
[6] Learning mixed behaviours with parallel Q-Learning
Laurent, GJ
Piat, E
[J]. 2002 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-3, PROCEEDINGS, 2002, : 1002 - 1007
[7] FARANE-Q: Fast Parallel and Pipeline Q-Learning Accelerator for Configurable Reinforcement Learning SoC
Sutisna, Nana
Ilmy, Andi M. Riyadhus
Syafalni, Infall
Mulyawan, Rahmat
Adiono, Trio
[J]. IEEE ACCESS, 2023, 11 : 144 - 161
[8] Feasible Q-Learning for Average Reward Reinforcement Learning
Jin, Ying
Blanchet, Jose
Gummadi, Ramki
Zhou, Zhengyuan
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[9] Mildly Conservative Q-Learning for Offline Reinforcement Learning
Lyu, Jiafei
Ma, Xiaoteng
Li, Xiu
Lu, Zongqing
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[10] Adaptable Conservative Q-Learning for Offline Reinforcement Learning
Qiu, Lyn
Li, Xu
Liang, Lenghan
Sun, Mingming
Yan, Junchi
[J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 200 - 212

← 1 2 3 4 5 →