Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA

被引:35
|
作者
Da Silva, Lucileide M. D. [1 ]
Torquato, Matheus F. [2 ]
Fernandes, Marcelo A. C. [3 ]
机构
[1] Fed Inst Rio Grande do Norte, Dept Comp Sci & Technol, BR-59200000 Santa Cruz, Brazil
[2] Swansea Univ, Coll Engn, Swansea SA2 8PP, W Glam, Wales
[3] Univ Fed Rio Grande do Norte, Dept Comp Engn & Automat, BR-59078970 Natal, RN, Brazil
来源
IEEE ACCESS | 2019年 / 7卷
关键词
FPGA; Q-learning; reinforcement learning; reconfigurable computing; HARDWARE; ARCHITECTURE; NETWORK;
D O I
10.1109/ACCESS.2018.2885950
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Q-learning is an off-policy reinforcement learning technique, which has the main advantage of obtaining an optimal policy interacting with an unknown model environment. This paper proposes a parallel fixed-point Q-learning algorithm architecture implemented on field programmable gate arrays (FPGA) focusing on optimizing the system processing time. The convergence results are presented, and the processing time and occupied area were analyzed for different states and actions sizes scenarios and various fixed-point formats. The studies concerning the accuracy of the Q-learning technique response and resolution error associated with a decrease in the number of bits were also carried out for hardware implementation. The architecture implementation details were featured. The entire project was developed using the system generator platform (Xilinx), with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA.
引用
收藏
页码:2782 / 2798
页数:17
相关论文
共 50 条
  • [1] An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
    Spano, Sergio
    Cardarilli, Gian Carlo
    Di Nunzio, Luca
    Fazzolari, Rocco
    Giardino, Daniele
    Matta, Marco
    Nannarelli, Alberto
    Re, Marco
    [J]. IEEE ACCESS, 2019, 7 : 186340 - 186351
  • [2] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [3] Fuzzy Q-Learning for generalization of reinforcement learning
    Berenji, HR
    [J]. FUZZ-IEEE '96 - PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, 1996, : 2208 - 2214
  • [4] Deep Reinforcement Learning with Double Q-Learning
    van Hasselt, Hado
    Guez, Arthur
    Silver, David
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2094 - 2100
  • [5] Reinforcement learning guidance law of Q-learning
    Zhang, Qinhao
    Ao, Baiqiang
    Zhang, Qinxue
    [J]. Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2020, 42 (02): : 414 - 419
  • [6] Learning mixed behaviours with parallel Q-Learning
    Laurent, GJ
    Piat, E
    [J]. 2002 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-3, PROCEEDINGS, 2002, : 1002 - 1007
  • [7] FARANE-Q: Fast Parallel and Pipeline Q-Learning Accelerator for Configurable Reinforcement Learning SoC
    Sutisna, Nana
    Ilmy, Andi M. Riyadhus
    Syafalni, Infall
    Mulyawan, Rahmat
    Adiono, Trio
    [J]. IEEE ACCESS, 2023, 11 : 144 - 161
  • [8] Feasible Q-Learning for Average Reward Reinforcement Learning
    Jin, Ying
    Blanchet, Jose
    Gummadi, Ramki
    Zhou, Zhengyuan
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [9] Mildly Conservative Q-Learning for Offline Reinforcement Learning
    Lyu, Jiafei
    Ma, Xiaoteng
    Li, Xiu
    Lu, Zongqing
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [10] Adaptable Conservative Q-Learning for Offline Reinforcement Learning
    Qiu, Lyn
    Li, Xu
    Liang, Lenghan
    Sun, Mingming
    Yan, Junchi
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 200 - 212