An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm

被引：44

作者：

Spano, Sergio ^{[1
]}

Cardarilli, Gian Carlo ^{[1
]}

Di Nunzio, Luca ^{[1
]}

Fazzolari, Rocco ^{[1
]}

Giardino, Daniele ^{[1
]}

Matta, Marco ^{[1
]}

Nannarelli, Alberto ^{[2
]}

Re, Marco ^{[1
]}

机构：

[1] Univ Roma Tor Vergata, Dept Elect Engn, I-00133 Rome, Italy

[2] Danmarks Tekniske Univ, Dept Appl Math & Comp Sci, DK-2800 Lyngby, Denmark

来源：

IEEE ACCESS | 2019年 / 7卷

关键词：

Artificial intelligence; hardware accelerator; machine learning; Q-learning; reinforcement learning; SARSA; FPGA; ASIC; IoT; multi-agent; INTELLIGENCE;

D O I：

10.1109/ACCESS.2019.2961174

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper we propose an efficient hardware architecture that implements the Q-Learning algorithm, suitable for real-time applications. Its main features are low-power, high throughput and limited hardware resources. We also propose a technique based on approximated multipliers to reduce the hardware complexity of the algorithm. We implemented the design on a Xilinx Zynq Ultrascale + MPSoC ZCU106 Evaluation Kit. The implementation results are evaluated in terms of hardware resources, throughput and power consumption. The architecture is compared to the state of the art of Q-Learning hardware accelerators presented in the literature obtaining better results in speed, power and hardware resources. Experiments using different sizes for the Q-Matrix and different wordlengths for the fixed point arithmetic are presented. With a Q-Matrix of size 8 x 4 (8 bit data) we achieved a throughput of 222 MSPS (Mega Samples Per Second) and a dynamic power consumption of 37 mW, while with a Q-Matrix of size 256 x 16 (32 bit data) we achieved a throughput of 93 MSPS and a power consumption 611 mW. Due to the small amount of hardware resources required by the accelerator, our system is suitable for multi-agent IoT applications. Moreover, the architecture can be used to implement the SARSA (State-Action-Reward-State-Action) Reinforcement Learning algorithm with minor modifications.

引用

页码：186340 / 186351

页数：12

共 50 条

[1] Parallel Implementation of Reinforcement Learning Q-Learning Technique for FPGA
Da Silva, Lucileide M. D.
Torquato, Matheus F.
Fernandes, Marcelo A. C.
[J]. IEEE ACCESS, 2019, 7 : 2782 - 2798
[2] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Tan, Fuxiao
Yan, Pengfei
Guan, Xinping
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
[3] Efficient implementation of dynamic fuzzy Q-learning
Deng, C
Er, MJ
[J]. ICICS-PCM 2003, VOLS 1-3, PROCEEDINGS, 2003, : 1854 - 1858
[4] An inverse reinforcement learning framework with the Q-learning mechanism for the metaheuristic algorithm
Zhao, Fuqing
Wang, Qiaoyun
Wang, Ling
[J]. KNOWLEDGE-BASED SYSTEMS, 2023, 265
[5] Fuzzy Q-Learning for generalization of reinforcement learning
Berenji, HR
[J]. FUZZ-IEEE '96 - PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, 1996, : 2208 - 2214
[6] Deep Reinforcement Learning with Double Q-Learning
van Hasselt, Hado
Guez, Arthur
Silver, David
[J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2094 - 2100
[7] Reinforcement learning guidance law of Q-learning
Zhang, Qinhao
Ao, Baiqiang
Zhang, Qinxue
[J]. Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2020, 42 (02): : 414 - 419
[8] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
Wang, Yin-Hao
Li, Tzuu-Hseng S.
Lin, Chih-Jui
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
[9] Feasible Q-Learning for Average Reward Reinforcement Learning
Jin, Ying
Blanchet, Jose
Gummadi, Ramki
Zhou, Zhengyuan
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[10] An online scalarization multi-objective reinforcement learning algorithm: TOPSIS Q-learning
Mirzanejad, Mohammad
Ebrahimi, Morteza
Vamplew, Peter
Veisi, Hadi
[J]. KNOWLEDGE ENGINEERING REVIEW, 2022, 37 (04):

← 1 2 3 4 5 →