Self-Regulating Action Exploration in Reinforcement Learning

被引:9
|
作者
Teng, Teck-Hou [1 ]
Tan, Ah-Hwee [1 ]
Tan, Yuan-Sin [2 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore, Singapore
[2] DSO Natl Labs, Science Park Drive, Singapore
关键词
reinforcement learning; exploration-exploitation dilemma; k-armed bandit; pursuit-evasion; self-organizing neural network;
D O I
10.1016/j.procs.2012.09.110
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The basic tenet of a learning process is for an agent to learn for only as much and as long as it is necessary. With reinforcement learning, the learning process is divided between exploration and exploitation. Given the complexity of the problem domain and the randomness of the learning process, the exact duration of the reinforcement learning process can never be known with certainty. Using an inaccurate number of training iterations leads either to the non-convergence or the over-training of the learning agent. This work addresses such issues by proposing a technique to self-regulate the exploration rate and training duration leading to convergence efficiently. The idea originates from an intuitive understanding that exploration is only necessary when the success rate is low. This means the rate of exploration should be conducted in inverse proportion to the rate of success. In addition, the change in exploration-exploitation rates alters the duration of the learning process. Using this approach, the duration of the learning process becomes adaptive to the updated status of the learning process. Experimental results from the K-Armed Bandit and Air Combat Maneuver scenario prove that optimal action policies can be discovered using the right amount of training iterations. In essence, the proposed method eliminates the guesswork on the amount of exploration needed during reinforcement learning. (C) 2012 Published by Elsevier B. V. Selection and/or peer-review under responsibility of Program Committee of INNS-WC 2012
引用
收藏
页码:18 / 30
页数:13
相关论文
共 50 条
  • [31] Examples of self-regulating cycles and their insertion in the process of musical self-learning
    Correa, Antenor Ferreira
    Morato Martins, Luciana Stadniki
    MUSICA HODIE, 2020, 20
  • [32] A Self-Regulating Power-Control Scheme Using Reinforcement Learning for D2D Communication Networks
    Ban, Tae-Won
    SENSORS, 2022, 22 (13)
  • [33] Triathletes are experts in self-regulating physical activity - But what about self-regulating neural activity?
    Kober, Silvia Erika
    Ninaus, Manuel
    Witte, Matthias
    Buchrieser, Finn
    Groessinger, Doris
    Fischmeister, Florian Ph S.
    Neuper, Christa
    Wood, Guilherme
    BIOLOGICAL PSYCHOLOGY, 2022, 173
  • [34] Self-regulating studying by objectives for learning: Students' reports compared to a model
    Winne, PH
    Jamieson-Noel, D
    CONTEMPORARY EDUCATIONAL PSYCHOLOGY, 2003, 28 (03) : 259 - 276
  • [35] The self-regulating brain: Cortical-subcortical feedback and the development of intelligent action
    Lewis, Marc D.
    Todd, Rebecca M.
    COGNITIVE DEVELOPMENT, 2007, 22 (04) : 406 - 430
  • [36] Education for Sustainable Development: Self-Regulating Learning Strategies in an Online Environment
    Kabanov, Oleg Vladimirovich
    Sergeevna, Tokareva Julia
    Olegovna, Altunina Yulia
    Evgenievna, Gorlova Olga
    Sergeevna, Filonova Anna
    Stoyanova, Lela
    INTERNATIONAL JOURNAL OF APPLIED EXERCISE PHYSIOLOGY, 2020, 9 (12): : 196 - 199
  • [37] Perceptions of course efficacy: A self-regulating learning approach for creating relevance
    Wittenburg, DK
    RESEARCH QUARTERLY FOR EXERCISE AND SPORT, 2002, 73 (01) : A84 - A84
  • [38] Safe Exploration of State and Action Spaces in Reinforcement Learning
    Garcia, Javier
    Fernandez, Fernando
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2012, 45 : 515 - 564
  • [39] PLANT PHYTOTOXICITY - A SELF-REGULATING PATHWAY
    不详
    BIOCYCLE, 1989, 30 (07) : 12 - 12
  • [40] OSCILLATIONS OF A SYSTEM WITH SELF-REGULATING DELAY
    NORKIN, SB
    JOURNAL OF THE ASTRONAUTICAL SCIENCES, 1966, 13 (01): : 45 - &