Efficient Off-Policy Reinforcement Learning via Brain-Inspired Computing

被引:0
|
作者
Ni, Yang [1 ]
Abraham, Danny [1 ]
Issa, Mariam [1 ]
Kim, Yeseong [2 ]
Mercati, Pietro [3 ]
Imani, Mohsen [1 ]
机构
[1] Univ Calif Irvine, Irvine, CA 92717 USA
[2] Daegu Gyeongbuk Inst Sci & Technol, Daegu, South Korea
[3] Intel Labs, Hillsboro, OR USA
基金
美国国家科学基金会;
关键词
Hyperdimensional Computing; Brain-inspired Computing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement Learning (RL) has opened up new opportunities to enhance existing smart systems that generally include a complex decision-making process. However, modern RL algorithms, e.g., Deep Q-Networks (DQN), are based on deep neural networks, resulting in high computational costs. In this paper, we propose QHD, an off-policy value-based Hyperdimensional Reinforcement Learning, that mimics brain properties toward robust and realtime learning. QHD relies on a lightweight brain-inspired model to learn an optimal policy in an unknown environment. On both desktop and power-limited embedded platforms, QHD achieves signiffcantly better overall effciency than DQN while providing higher or comparable rewards. QHD is also suitable for highlyeffcient reinforcement learning with great potential for online and real-time learning. Our solution supports a small experience replay batch size that provides 12.3x speedup compared to DQN while ensuring minimal quality loss. Our evaluation shows QHD capability for real-time learning, providing 34.6x speedup and signiffcantly better quality of learning than DQN.
引用
收藏
页码:449 / 453
页数:5
相关论文
共 50 条
  • [1] Safe and efficient off-policy reinforcement learning
    Munos, Remi
    Stepleton, Thomas
    Harutyunyan, Anna
    Bellemare, Marc G.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [2] Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning
    Thomas, Philip S.
    Brunskill, Emma
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [3] Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning
    Yin, Ming
    Wang, Yu-Xiang
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [4] OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching
    Hoshino, Hana
    Ota, Kei
    Kanezaki, Asako
    Yokota, Rio
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022,
  • [5] Double Reinforcement Learning for Efficient and Robust Off-Policy Evaluation
    Kallus, Nathan
    Uehara, Masatoshi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [6] Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
    Rakelly, Kate
    Zhou, Aurick
    Quillen, Deirdre
    Finn, Chelsea
    Levine, Sergey
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [7] Brain-inspired computing and machine learning
    Iliadis, Lazaros S.
    Kurkova, Vera
    Hammer, Barbara
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (11): : 6641 - 6643
  • [8] Brain-inspired computing and machine learning
    Lazaros S. Iliadis
    Vera Kurkova
    Barbara Hammer
    [J]. Neural Computing and Applications, 2020, 32 : 6641 - 6643
  • [9] Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning
    Kallus, Nathan
    Uehara, Masatoshi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [10] An efficient and lightweight off-policy actor–critic reinforcement learning framework
    Zhang, Huaqing
    Ma, Hongbin
    Zhang, Xiaofei
    Mersha, Bemnet Wondimagegnehu
    Wang, Li
    Jin, Ying
    [J]. Applied Soft Computing, 2024, 163