Cognitively inspired reinforcement learning architecture and its application to giant-swing motion control

被引:4
|
作者
Uragami, Daisuke [1 ]
Takahashi, Tatsuji [2 ]
Matsuo, Yoshiki [1 ]
机构
[1] Tokyo Univ Technol, Sch Comp Sci, Hachioji, Tokyo 1920982, Japan
[2] Tokyo Denki Univ, Sch Sci & Technol, Hiki, Saitama 3500394, Japan
基金
日本学术振兴会;
关键词
Q-learning; Exploration-exploitation dilemma; Bio-inspired computing; Cognitive bias; Loosely symmetric model; Acrobot; Multi-armed bandit problems; ACQUISITION; MODEL; BEHAVIOR; MAP;
D O I
10.1016/j.biosystems.2013.11.002
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Many algorithms and methods in artificial intelligence or machine learning were inspired by human cognition. As a mechanism to handle the exploration-exploitation dilemma in reinforcement learning, the loosely symmetric (LS) value function that models causal intuition of humans was proposed (Shinohara et al., 2007). While LS shows the highest correlation with causal induction by humans, it has been reported that it effectively works in multi-armed bandit problems that form the simplest class of tasks representing the dilemma. However, the scope of application of LS was limited to the reinforcement learning problems that have K actions with only one state (K-armed bandit problems). This study proposes LS-Q learning architecture that can deal with general reinforcement learning tasks with multiple states and delayed reward. We tested the learning performance of the new architecture in giant-swing robot motion learning, where uncertainty and unknown-ness of the environment is huge. In the test, the help of ready-made internal models or functional approximation of the state space were not given. The simulations showed that while the ordinary Q-learning agent does not reach giant-swing motion because of stagnant loops (local optima with low rewards), LS-Q escapes such loops and acquires giant-swing. It is confirmed that the smaller number of states is, in other words, the more coarse-grained the division of states and the more incomplete the state observation is, the better LS-Q performs in comparison with Q-learning. We also showed that the high performance of LS-Q depends comparatively little on parameter tuning and learning time. This suggests that the proposed method inspired by human cognition works adaptively in real environments. (C) 2013 Elsevier Ireland Ltd. All rights reserved.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [11] Quantum-Inspired Reinforcement Learning for Quantum Control
    Yu, Haixu
    Zhao, Xudong
    Chen, Chunlin
    IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2025, 33 (01) : 61 - 76
  • [12] A REINFORCEMENT LEARNING APPROACH FOR OPTIMIZATION OF CHEMOTHERAPY AND ITS APPLICATION IN OPTIMAL CONTROL
    Pakdel, A. Fani
    Rezazadeh, M.
    Sustany, M. Naghiby
    ANNALS OF ONCOLOGY, 2012, 23 : 459 - 459
  • [13] Reinforcement Learning Control and Its Application on Rocket-like Vehicle
    Huang X.
    Liu J.
    Jia C.
    Luo W.
    Gong Q.
    Feng M.
    Yuhang Xuebao/Journal of Astronautics, 2023, 44 (05): : 708 - 718
  • [14] MOTION CONTROL OF A NONLINEAR SPRING BY REINFORCEMENT LEARNING
    Bucak, I. O.
    Zohdy, M. A.
    Shillor, M.
    CONTROL AND INTELLIGENT SYSTEMS, 2008, 36 (01)
  • [15] Safe Reinforcement Learning Algorithm and Its Application in Intelligent Control for CPS
    Zhao H.-J.
    Li Q.-Z.
    Zeng X.
    Liu Z.-M.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (07): : 2538 - 2561
  • [16] Reinforcement learning for motion control of humanoid robots
    Iida, S. (iida@ics.nitech.ac.jp), 2004, Institute of Electrical and Electronics Engineers, IEEE; Robotics Society of Japan, RSJ (Institute of Electrical and Electronics Engineers Inc.):
  • [17] Design of Reinforcement Learning Algorathm for Single Inverted Pendulum Swing Control
    Yue Chao
    Liu Yongxin
    Wang Linglin
    2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 1558 - 1562
  • [18] Phased learning with hierarchical reinforcement learning in nonholonomic motion control
    Goto, Takaknuni
    Homma, Noriyasu
    Yoshizawa, Makoto
    Abe, Kenichi
    2006 SICE-ICASE INTERNATIONAL JOINT CONFERENCE, VOLS 1-13, 2006, : 2200 - +
  • [19] A parameter control method inspired from neuromodulators in reinforcement learning
    Murakoshi, K
    Mizuno, J
    2003 IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION, VOLS I-III, PROCEEDINGS, 2003, : 7 - 12
  • [20] Reinforcement Learning for Bio-Inspired Stochastic Robot Control
    Gillespie, James
    Rano, Inaki
    Santos, Jose
    Siddique, Nazmul
    2023 31ST IRISH CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COGNITIVE SCIENCE, AICS, 2023,