Adaptive exploration policy for exploration-exploitation tradeoff in continuous action control optimization

被引：5

作者：

Li, Min ^{[1
]}

Huang, Tianyi ^{[1
]}

Zhu, William ^{[1
]}

机构：

[1] Univ Elect Sci & Technol China, Inst Fundamental & Frontier Sci, Chengdu, Peoples R China

来源：

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS | 2021年 / 12卷 / 12期

关键词：

Continuous action control optimization; Reinforcement learning; Exploration-exploitation tradeoff; Adaptive exploration policy; Stability of training;

D O I：

10.1007/s13042-021-01387-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The optimization of continuous action control is an important research field. It aims to find optimal decisions by the experience of making decisions in a continuous action control task. This process can be done via reinforcement learning to train an agent for learning a policy by maximizing cumulative rewards of making decisions in a dynamic environment. Exploration-exploitation tradeoff is a key issue in learning this policy. The current solution called exploration policy addresses this issue by adding exploration noise to the policy in training for more efficient exploration while keeping exploitation. This noise is from a fixed distribution during the training process. However, in the dynamic environment, the stability of training is frequently changed in different training episodes, leading to the low adaptability for exploration policy to training stability. In this paper, we propose an adaptive exploration policy to address exploration-exploitation tradeoff. The motivation is that the noise scale should be increased to enhance exploration when the stability of training is high, while it should be reduced to keep exploitation when the stability of training is low. Firstly, we regard the variance of cumulative rewards from decisions as an index of the training stability. Then, based on this index, we construct a tradeoff coefficient, which is negatively correlated to the training stability. Finally, we propose adaptive exploration policy by the tradeoff coefficient to adjust the added exploration noise for adapting to the training stability. By the theoretical analysis and the experiments, we illustrate the effectiveness of our adaptive exploration policy. The source code can be downloaded from .

引用

页码：3491 / 3501

页数：11

共 50 条

[1] Adaptive exploration policy for exploration–exploitation tradeoff in continuous action control optimization
Min Li
Tianyi Huang
William Zhu
[J]. International Journal of Machine Learning and Cybernetics, 2021, 12 : 3491 - 3501
[2] Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits
Wu, Huasen
Guo, Xueying
Liu, Xin
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[3] Dopamine, Locus of Control, and the Exploration-Exploitation Tradeoff
Andrew S Kayser
Jennifer M Mitchell
Dawn Weinstein
Michael J Frank
[J]. Neuropsychopharmacology, 2015, 40 : 454 - 462
[4] Dopamine, Locus of Control, and the Exploration-Exploitation Tradeoff
Kayser, Andrew S.
Mitchell, Jennifer M.
Weinstein, Dawn
Frank, Michael J.
[J]. NEUROPSYCHOPHARMACOLOGY, 2015, 40 (02) : 454 - 462
[5] Social Learning and the Exploration-Exploitation Tradeoff
Mintz, Brian
Fu, Feng
[J]. COMPUTATION, 2023, 11 (05)
[6] Optimal Contraction Theorem for Exploration-Exploitation Tradeoff in Search and Optimization
Chen, Jie
Xin, Bin
Peng, Zhihong
Dou, Lihua
Zhang, Juan
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2009, 39 (03): : 680 - 691
[7] Adaptive Genetic Algorithm with Exploration-Exploitation Tradeoff for Preprocessing Microarray Datasets
Rajappan, Sivaraj
Rangasamy, DeviPriya
[J]. CURRENT BIOINFORMATICS, 2017, 12 (05) : 441 - 451
[8] The Exploration-Exploitation Tradeoff and Efficiency in Knowledge Production
Sudhir, K.
[J]. MARKETING SCIENCE, 2016, 35 (01) : 1 - 9
[9] Source Coding in the Presence of Exploration-Exploitation Tradeoff
Akyol, Emrah
Mitra, Urbashi
Tuncel, Ertem
Rose, Kenneth
[J]. 2014 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2014, : 2057 - 2061
[10] Chicken swarm optimization with an enhanced exploration-exploitation tradeoff and its application
Wang, Yingcong
Sui, Chengcheng
Liu, Chi
Sun, Junwei
Wang, Yanfeng
[J]. SOFT COMPUTING, 2023, 27 (12) : 8013 - 8028

← 1 2 3 4 5 →