Scaled free-energy based reinforcement learning for robust and efficient learning in high-dimensional state spaces

被引:8
|
作者
Elfwing, Stefan [1 ]
Uchibe, Eiji [1 ]
Doya, Kenji [1 ]
机构
[1] Grad Univ, Okinawa Inst Sci & Technol, Neural Computat Unit, Onna Son, Okinawa 9040412, Japan
来源
关键词
reinforcement learning; free-energy; restricted Boltzmann machine; robot navigation; function approximation; SPATIAL COGNITION; NAVIGATION; MODEL;
D O I
10.3389/fnbot.2013.00003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Free energy based reinforcement learning (FERL) was proposed for learning in high-dimensional state- and action spaces, which cannot be handled by standard function approximation methods. In this study, we propose a scaled version of free-energy based reinforcement learning to achieve more robust and more efficient learning performance. The action value function is approximated by the negative free energy of a restricted Boltzmann machine, divided by a constant scaling factor that is related to the size of the Boltzmann machine (the square root of the number of state nodes in this study). Our first task is a digit floor gridworld task, where the states are represented by images of handwritten digits from the MNIST data set. The purpose of the task is to investigate the proposed method's ability, through the extraction of task relevant features in the hidden layer, to cluster images of the same digit and to cluster images of different digits that corresponds to states with the same optimal action. We also test the method's robustness with respect to different exploration schedules, i.e., different settings of the initial temperature and the temperature discount rate in softmax action selection. Our second task is a robot visual navigation task, where the robot can learn its position by the different colors of the lower part of four landmarks and it can infer the correct corner goal area by the color of the upper part of the landmarks. The state space consists of binarized camera images with, at most, nine different colors, which is equal to 6642 binary states. For both tasks, the learning performance is compared with standard FERL and with function approximation where the action-value function is approximated by a two-layered feedforward neural network.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Free-Energy Based Reinforcement Learning for Vision-Based Navigation with High-Dimensional Sensory Inputs
    Elfwing, Stefan
    Otsuka, Makoto
    Uchibe, Eiji
    Doya, Kenji
    [J]. NEURAL INFORMATION PROCESSING: THEORY AND ALGORITHMS, PT I, 2010, 6443 : 215 - 222
  • [2] A state space compression method based on multivariate analysis for reinforcement learning in high-dimensional continuous state spaces
    Satoh, Hideki
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2006, E89A (08): : 2181 - 2191
  • [3] Abstraction from demonstration for efficient reinforcement learning in high-dimensional domains
    Cobo, Luis C.
    Subramanian, Kaushik
    Isbell, Charles L., Jr.
    Lanterman, Aaron D.
    Thomaz, Andrea L.
    [J]. ARTIFICIAL INTELLIGENCE, 2014, 216 : 103 - 128
  • [4] Developing reinforcement learning for adaptive co-construction of continuous high-dimensional state and action spaces
    Nagayoshi, Masato
    Murao, Hajime
    Tamaki, Hisashi
    [J]. ARTIFICIAL LIFE AND ROBOTICS, 2012, 17 (02) : 204 - 210
  • [5] Developing reinforcement learning for adaptive co-construction of continuous high-dimensional state and action spaces
    Masato Nagayoshi
    Hajime Murao
    Hisashi Tamaki
    [J]. Artificial Life and Robotics, 2012, 17 (2) : 204 - 210
  • [6] Efficient Sampling of High-Dimensional Free-Energy Landscapes with Parallel Bias Metadynamics
    Pfaendtner, Jim
    Bonomi, Massimiliano
    [J]. JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2015, 11 (11) : 5062 - 5067
  • [7] NEURAL DISCRETE ABSTRACTION OF HIGH-DIMENSIONAL SPACES: A CASE STUDY IN REINFORCEMENT LEARNING
    Giannakopoulos, Petros
    Pikrakis, Aggelos
    Cotronis, Yannis
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 1517 - 1521
  • [8] Efficient sampling of constrained high-dimensional theoretical spaces with machine learning
    Jacob Hollingsworth
    Michael Ratz
    Philip Tanedo
    Daniel Whiteson
    [J]. The European Physical Journal C, 2021, 81
  • [9] Efficient sampling of constrained high-dimensional theoretical spaces with machine learning
    Hollingsworth, Jacob
    Ratz, Michael
    Tanedo, Philip
    Whiteson, Daniel
    [J]. EUROPEAN PHYSICAL JOURNAL C, 2021, 81 (12):
  • [10] High-dimensional Function Optimisation by Reinforcement Learning
    Wu, Q. H.
    Liao, H. L.
    [J]. 2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,