Efficient Off-Policy Reinforcement Learning via Brain-Inspired Computing

被引：0

作者：

Ni, Yang ^{[1
]}

Abraham, Danny ^{[1
]}

Issa, Mariam ^{[1
]}

Kim, Yeseong ^{[2
]}

Mercati, Pietro ^{[3
]}

Imani, Mohsen ^{[1
]}

机构：

[1] Univ Calif Irvine, Irvine, CA 92717 USA

[2] Daegu Gyeongbuk Inst Sci & Technol, Daegu, South Korea

[3] Intel Labs, Hillsboro, OR USA

来源：

PROCEEDINGS OF THE GREAT LAKES SYMPOSIUM ON VLSI 2023, GLSVLSI 2023 | 2023年

基金：

美国国家科学基金会;

关键词：

Hyperdimensional Computing; Brain-inspired Computing;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement Learning (RL) has opened up new opportunities to enhance existing smart systems that generally include a complex decision-making process. However, modern RL algorithms, e.g., Deep Q-Networks (DQN), are based on deep neural networks, resulting in high computational costs. In this paper, we propose QHD, an off-policy value-based Hyperdimensional Reinforcement Learning, that mimics brain properties toward robust and realtime learning. QHD relies on a lightweight brain-inspired model to learn an optimal policy in an unknown environment. On both desktop and power-limited embedded platforms, QHD achieves signiffcantly better overall effciency than DQN while providing higher or comparable rewards. QHD is also suitable for highlyeffcient reinforcement learning with great potential for online and real-time learning. Our solution supports a small experience replay batch size that provides 12.3x speedup compared to DQN while ensuring minimal quality loss. Our evaluation shows QHD capability for real-time learning, providing 34.6x speedup and signiffcantly better quality of learning than DQN.

引用

页码：449 / 453

页数：5

共 50 条

[1] Safe and efficient off-policy reinforcement learning
Munos, Remi
Stepleton, Thomas
Harutyunyan, Anna
Bellemare, Marc G.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[2] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
Thomas, Philip S.
Brunskill, Emma
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[3] Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
Yin, Ming
Wang, Yu-Xiang
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[4] OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching
Hoshino, Hana
Ota, Kei
Kanezaki, Asako
Yokota, Rio
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022,
[5] Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
Kallus, Nathan
Uehara, Masatoshi
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[6] Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
Rakelly, Kate
Zhou, Aurick
Quillen, Deirdre
Finn, Chelsea
Levine, Sergey
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[7] Brain-inspired computing and machine learning
Iliadis, Lazaros S.
Kurkova, Vera
Hammer, Barbara
[J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (11): : 6641 - 6643
[8] Brain-inspired computing and machine learning
Lazaros S. Iliadis
Vera Kurkova
Barbara Hammer
[J]. Neural Computing and Applications, 2020, 32 : 6641 - 6643
[9] Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
Kallus, Nathan
Uehara, Masatoshi
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[10] An efficient and lightweight off-policy actor–critic reinforcement learning framework
Zhang, Huaqing
Ma, Hongbin
Zhang, Xiaofei
Mersha, Bemnet Wondimagegnehu
Wang, Li
Jin, Ying
[J]. Applied Soft Computing, 2024, 163

← 1 2 3 4 5 →