Quantum-enhanced reinforcement learning for control: a preliminary study

被引:2
作者
Hu, Yazhou [1 ]
Tang, Fengzhen [2 ,3 ]
Chen, Jun [1 ]
Wang, Wenxue [2 ,3 ]
机构
[1] Northwest A&F Univ, Coll Mech & Elect Engn, Yangling 712100, Shaanxi, Peoples R China
[2] Chinese Acad Sci, Shenyang Inst Automat, State Key Lab Robot, Shenyang 110016, Liaoning, Peoples R China
[3] Chinese Acad Sci, Inst Robot & Intelligent Mfg, Shenyang 110169, Liaoning, Peoples R China
关键词
Quantum theory; Reinforcement learning; Quantum computation; State superposition; Optimal control; SPEED-UP; COMPUTATION; IMPLEMENTATION; ALGORITHM;
D O I
10.1007/s11768-021-00063-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning is one of the fastest growing areas in machine learning, and has obtained great achievements in biomedicine, Internet of Things (IoT), logistics, robotic control, etc. However, there are still many challenges for engineering applications, such as how to speed up the learning process, how to balance the trade-off between exploration and exploitation. Quantum technology, which can solve complex problems faster than classical methods, especially in supercomputers, provides us a new paradigm to overcome these challenges in reinforcement learning. In this paper, a quantum-enhanced reinforcement learning is pictured for optimal control. In this algorithm, the states and actions of reinforcement learning are quantized by quantum technology. And then, a probability amplification method, which can effectively avoid the trade-off between exploration and exploitation via quantized technology, is presented. Finally, the optimal control policy is learnt during the process of reinforcement learning. The performance of this quantized algorithm is demonstrated in both MountainCar reinforcement learning environment and CartPole reinforcement learning environment-one kind of classical control reinforcement learning environment in the OpenAI Gym. The preliminary study results validate that, compared with Q-learning, this quantized reinforcement learning method has better control performance without considering the trade-off between exploration and exploitation. The learning performance of this new algorithm is stable with different learning rates from 0.01 to 0.10, which means it is promising to be employed in unknown dynamics systems.
引用
收藏
页码:455 / 464
页数:10
相关论文
共 39 条
  • [1] Bianchi RAC, 2004, LECT NOTES ARTIF INT, V3171, P245
  • [2] Boyer M, 1998, FORTSCHR PHYS, V46, P493, DOI 10.1002/(SICI)1521-3978(199806)46:4/5<493::AID-PROP493>3.0.CO
  • [3] 2-P
  • [4] Brockman Greg, 2016, OPENAI GYM
  • [5] Quantum computation for action selection using reinforcement learning
    C. L. Chen
    D. Y. Dong
    Z. H. Chen
    [J]. INTERNATIONAL JOURNAL OF QUANTUM INFORMATION, 2006, 4 (06) : 1071 - 1083
  • [6] Celiberto Luiz A. Jr., 2010, Proceedings 2010 Latin American Robotics Symposium and Intelligent Robotic Meeting (LARS 2010), P55, DOI 10.1109/LARS.2010.24
  • [7] Chang M., 2020, P MACHINE LEARNING R, P1437
  • [8] Experimental implementation of fast quantum searching
    Chuang, IL
    Gershenfeld, N
    Kubinec, M
    [J]. PHYSICAL REVIEW LETTERS, 1998, 80 (15) : 3408 - 3411
  • [9] Quantum reinforcement learning
    Dong, Daoyi
    Chen, Chunlin
    Li, Hanxiong
    Tarn, Tzyh-Jong
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (05): : 1207 - 1220
  • [10] Robust Quantum-Inspired Reinforcement Learning for Robot Navigation
    Dong, Daoyi
    Chen, Chunlin
    Chu, Jian
    Tarn, Tzyh-Jong
    [J]. IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2012, 17 (01) : 86 - 97