Adaptive exploration policy for exploration-exploitation tradeoff in continuous action control optimization

被引:5
|
作者
Li, Min [1 ]
Huang, Tianyi [1 ]
Zhu, William [1 ]
机构
[1] Univ Elect Sci & Technol China, Inst Fundamental & Frontier Sci, Chengdu, Peoples R China
关键词
Continuous action control optimization; Reinforcement learning; Exploration-exploitation tradeoff; Adaptive exploration policy; Stability of training;
D O I
10.1007/s13042-021-01387-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The optimization of continuous action control is an important research field. It aims to find optimal decisions by the experience of making decisions in a continuous action control task. This process can be done via reinforcement learning to train an agent for learning a policy by maximizing cumulative rewards of making decisions in a dynamic environment. Exploration-exploitation tradeoff is a key issue in learning this policy. The current solution called exploration policy addresses this issue by adding exploration noise to the policy in training for more efficient exploration while keeping exploitation. This noise is from a fixed distribution during the training process. However, in the dynamic environment, the stability of training is frequently changed in different training episodes, leading to the low adaptability for exploration policy to training stability. In this paper, we propose an adaptive exploration policy to address exploration-exploitation tradeoff. The motivation is that the noise scale should be increased to enhance exploration when the stability of training is high, while it should be reduced to keep exploitation when the stability of training is low. Firstly, we regard the variance of cumulative rewards from decisions as an index of the training stability. Then, based on this index, we construct a tradeoff coefficient, which is negatively correlated to the training stability. Finally, we propose adaptive exploration policy by the tradeoff coefficient to adjust the added exploration noise for adapting to the training stability. By the theoretical analysis and the experiments, we illustrate the effectiveness of our adaptive exploration policy. The source code can be downloaded from .
引用
收藏
页码:3491 / 3501
页数:11
相关论文
共 50 条
  • [1] Adaptive exploration policy for exploration–exploitation tradeoff in continuous action control optimization
    Min Li
    Tianyi Huang
    William Zhu
    [J]. International Journal of Machine Learning and Cybernetics, 2021, 12 : 3491 - 3501
  • [2] Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits
    Wu, Huasen
    Guo, Xueying
    Liu, Xin
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [3] Dopamine, Locus of Control, and the Exploration-Exploitation Tradeoff
    Andrew S Kayser
    Jennifer M Mitchell
    Dawn Weinstein
    Michael J Frank
    [J]. Neuropsychopharmacology, 2015, 40 : 454 - 462
  • [4] Dopamine, Locus of Control, and the Exploration-Exploitation Tradeoff
    Kayser, Andrew S.
    Mitchell, Jennifer M.
    Weinstein, Dawn
    Frank, Michael J.
    [J]. NEUROPSYCHOPHARMACOLOGY, 2015, 40 (02) : 454 - 462
  • [5] Social Learning and the Exploration-Exploitation Tradeoff
    Mintz, Brian
    Fu, Feng
    [J]. COMPUTATION, 2023, 11 (05)
  • [6] Optimal Contraction Theorem for Exploration-Exploitation Tradeoff in Search and Optimization
    Chen, Jie
    Xin, Bin
    Peng, Zhihong
    Dou, Lihua
    Zhang, Juan
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2009, 39 (03): : 680 - 691
  • [7] Adaptive Genetic Algorithm with Exploration-Exploitation Tradeoff for Preprocessing Microarray Datasets
    Rajappan, Sivaraj
    Rangasamy, DeviPriya
    [J]. CURRENT BIOINFORMATICS, 2017, 12 (05) : 441 - 451
  • [8] The Exploration-Exploitation Tradeoff and Efficiency in Knowledge Production
    Sudhir, K.
    [J]. MARKETING SCIENCE, 2016, 35 (01) : 1 - 9
  • [9] Source Coding in the Presence of Exploration-Exploitation Tradeoff
    Akyol, Emrah
    Mitra, Urbashi
    Tuncel, Ertem
    Rose, Kenneth
    [J]. 2014 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2014, : 2057 - 2061
  • [10] Chicken swarm optimization with an enhanced exploration-exploitation tradeoff and its application
    Wang, Yingcong
    Sui, Chengcheng
    Liu, Chi
    Sun, Junwei
    Wang, Yanfeng
    [J]. SOFT COMPUTING, 2023, 27 (12) : 8013 - 8028