Cross-Entropy Optimization of Control Policies With Adaptive Basis Functions

被引：47

作者：

Busoniu, Lucian ^{[1
]}

Ernst, Damien ^{[2
,3
]}

De Schutter, Bart ^{[1
,4
]}

Babuska, Robert ^{[1
]}

机构：

[1] Delft Univ Technol, Delft Ctr Syst & Control, NL-2628 CD Delft, Netherlands

[2] Belgian Natl Fund Sci Res FRS FNRS, B-1000 Brussels, Belgium

[3] Univ Liege, Syst & Modeling Res Unit, B-4000 Liege, Belgium

[4] Delft Univ Technol, Marine & Transport Technol Dept, NL-2628 CD Delft, Netherlands

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS | 2011年 / 41卷 / 01期

关键词：

Adaptive basis functions; cross-entropy optimization; direct policy search; Markov decision processes; GRADIENT METHODS; REINFORCEMENT;

D O I：

10.1109/TSMCB.2010.2050586

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper introduces an algorithm for direct search of control policies in continuous-state discrete-actionMarkov decision processes. The algorithm looks for the best closed-loop policy that can be represented using a given number of basis functions (BFs), where a discrete action is assigned to each BF. The type of the BFs and their number are specified in advance and determine the complexity of the representation. Considerable flexibility is achieved by optimizing the locations and shapes of the BFs, together with the action assignments. The optimization is carried out with the cross-entropy method and evaluates the policies by their empirical return from a representative set of initial states. The return for each representative state is estimated using Monte Carlo simulations. The resulting algorithm for cross-entropy policy search with adaptive BFs is extensively evaluated in problems with two to six state variables, for which it reliably obtains good policies with only a small number of BFs. In these experiments, cross-entropy policy search requires vastly fewer BFs than value-function techniques with equidistant BFs, and outperforms policy search with a competing optimization algorithm called DIRECT.

引用

页码：196 / 209

页数：14

共 50 条

[1] Policy Search with Cross-Entropy Optimization of Basis Functions
Busoniu, Lucian
Ernst, Damien
De Schutter, Bart
Babuska, Robert
ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2009, : 153 - +
[2] The constrained entropy and cross-entropy functions
Niven, RK
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2004, 334 (3-4) : 444 - 458
[3] Parallel Cross-Entropy optimization
Evans, Gareth E.
Keith, Jonathan M.
Kroese, Dirk P.
PROCEEDINGS OF THE 2007 WINTER SIMULATION CONFERENCE, VOLS 1-5, 2007, : 2175 - 2181
[4] Cross-Entropy Optimization for Neuromodulation
Brar, Harleen K.
Pan, Yunpeng
Mahmoudi, Babak
Theodorou, Evangelos A.
2016 38TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2016, : 6357 - 6360
[5] An adaptive cross-entropy tuning of the PID control for robot manipulators
Bodur, Mehmet
WORLD CONGRESS ON ENGINEERING 2007, VOLS 1 AND 2, 2007, : 93 - 98
[6] Annealing Adaptive Search, Cross-Entropy, and Stochastic Approximation in Global Optimization
Hu, Jiaqiao
Hu, Ping
NAVAL RESEARCH LOGISTICS, 2011, 58 (05) : 457 - 477
[7] Cross-Entropy Optimized Cognitive Radio Policies
Oklander, Boris
Sidi, Moshe
NETWORKING 2011 WORKSHOPS, 2011, 6827 : 13 - 21
[8] Graph Adaptive Attention Network with Cross-Entropy
Chen, Zhao
ENTROPY, 2024, 26 (07)
[9] CEoptim: Cross-Entropy R Package for Optimization
Benham, Tim
Duan, Qibin
Kroese, Dirk P.
Liquet, Benoit
JOURNAL OF STATISTICAL SOFTWARE, 2017, 76 (08): : 1 - 29
[10] Volume image registration by cross-entropy optimization
Zhu, YM
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2002, 21 (02) : 174 - 180

← 1 2 3 4 5 →