An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm

被引:44
|
作者
Spano, Sergio [1 ]
Cardarilli, Gian Carlo [1 ]
Di Nunzio, Luca [1 ]
Fazzolari, Rocco [1 ]
Giardino, Daniele [1 ]
Matta, Marco [1 ]
Nannarelli, Alberto [2 ]
Re, Marco [1 ]
机构
[1] Univ Roma Tor Vergata, Dept Elect Engn, I-00133 Rome, Italy
[2] Danmarks Tekniske Univ, Dept Appl Math & Comp Sci, DK-2800 Lyngby, Denmark
来源
IEEE ACCESS | 2019年 / 7卷
关键词
Artificial intelligence; hardware accelerator; machine learning; Q-learning; reinforcement learning; SARSA; FPGA; ASIC; IoT; multi-agent; INTELLIGENCE;
D O I
10.1109/ACCESS.2019.2961174
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we propose an efficient hardware architecture that implements the Q-Learning algorithm, suitable for real-time applications. Its main features are low-power, high throughput and limited hardware resources. We also propose a technique based on approximated multipliers to reduce the hardware complexity of the algorithm. We implemented the design on a Xilinx Zynq Ultrascale + MPSoC ZCU106 Evaluation Kit. The implementation results are evaluated in terms of hardware resources, throughput and power consumption. The architecture is compared to the state of the art of Q-Learning hardware accelerators presented in the literature obtaining better results in speed, power and hardware resources. Experiments using different sizes for the Q-Matrix and different wordlengths for the fixed point arithmetic are presented. With a Q-Matrix of size 8 x 4 (8 bit data) we achieved a throughput of 222 MSPS (Mega Samples Per Second) and a dynamic power consumption of 37 mW, while with a Q-Matrix of size 256 x 16 (32 bit data) we achieved a throughput of 93 MSPS and a power consumption 611 mW. Due to the small amount of hardware resources required by the accelerator, our system is suitable for multi-agent IoT applications. Moreover, the architecture can be used to implement the SARSA (State-Action-Reward-State-Action) Reinforcement Learning algorithm with minor modifications.
引用
收藏
页码:186340 / 186351
页数:12
相关论文
共 50 条
  • [1] Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
    Da Silva, Lucileide M. D.
    Torquato, Matheus F.
    Fernandes, Marcelo A. C.
    [J]. IEEE ACCESS, 2019, 7 : 2782 - 2798
  • [2] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [3] Efficient implementation of dynamic fuzzy Q-learning
    Deng, C
    Er, MJ
    [J]. ICICS-PCM 2003, VOLS 1-3, PROCEEDINGS, 2003, : 1854 - 1858
  • [4] An inverse reinforcement learning framework with the Q-learning mechanism for the metaheuristic algorithm
    Zhao, Fuqing
    Wang, Qiaoyun
    Wang, Ling
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 265
  • [5] Fuzzy Q-Learning for generalization of reinforcement learning
    Berenji, HR
    [J]. FUZZ-IEEE '96 - PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, 1996, : 2208 - 2214
  • [6] Deep Reinforcement Learning with Double Q-Learning
    van Hasselt, Hado
    Guez, Arthur
    Silver, David
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2094 - 2100
  • [7] Reinforcement learning guidance law of Q-learning
    Zhang, Qinhao
    Ao, Baiqiang
    Zhang, Qinxue
    [J]. Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2020, 42 (02): : 414 - 419
  • [8] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
    Wang, Yin-Hao
    Li, Tzuu-Hseng S.
    Lin, Chih-Jui
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
  • [9] Feasible Q-Learning for Average Reward Reinforcement Learning
    Jin, Ying
    Blanchet, Jose
    Gummadi, Ramki
    Zhou, Zhengyuan
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [10] An online scalarization multi-objective reinforcement learning algorithm: TOPSIS Q-learning
    Mirzanejad, Mohammad
    Ebrahimi, Morteza
    Vamplew, Peter
    Veisi, Hadi
    [J]. KNOWLEDGE ENGINEERING REVIEW, 2022, 37 (04):