Cross Entropy Optimization of Action Modification Policies for Continuous-Valued MDPs

被引：0

作者：

Mirkamali, Kamelia ^{[1
]}

Busoniu, Lucian ^{[2
]}

机构：

[1] Khaje Nasir Toosi Univ Technol, Dept Comp Engn, Tehran, Iran

[2] Tech Univ Cluj Napoca, Automat Dept, Cluj Napoca, Romania

来源：

IFAC PAPERSONLINE | 2020年 / 53卷 / 02期

关键词：

Markov decision processes; policy search; cross-entropy optimization; continuous actions;

D O I：

10.1016/j.ifacol.2020.12.2292

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose an algorithm to search for parametrized policies in continuous state and action Markov Decision Processes (MDPs). The policies are represented via a number of basis functions, and the main novelty is that each basis function corresponds to a small, discrete modification of the continuous action. In each state, the policy chooses a discrete action modification associated with a basis function having the maximum value at the current state. Empirical returns from a representative set of initial states are estimated in simulations to evaluate the policies. Instead of using slow gradient-based algorithms, we apply cross entropy method for updating the parameters. The proposed algorithm is applied to a double integrator and an inverted pendulum problem, with encouraging results. Copyright (C) 2020 The Authors.

引用

页码：8124 / 8129

页数：6

共 13 条

[1] Continuous-valued map reconstruction with the Bayesian Maximum Entropy
D'Or, D
Bogaert, P
GEODERMA, 2003, 112 (3-4) : 169 - 178
[2] Specific Differential Entropy Rate Estimation for Continuous-Valued Time Series
Darmon, David
ENTROPY, 2016, 18 (05)
[3] Sample-Efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs
Low, Siow Meng
Kumar, Akshat
Sanner, Scott
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9840 - 9848
[4] The Cross-Entropy Method for Combinatorial and Continuous Optimization
Reuven Rubinstein
Methodology And Computing In Applied Probability, 1999, 1 (2) : 127 - 190
[5] Cross-Entropy Optimization of Control Policies With Adaptive Basis Functions
Busoniu, Lucian
Ernst, Damien
De Schutter, Bart
Babuska, Robert
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (01): : 196 - 209
[6] The cross-entropy method for continuous multi-extremal optimization
Kroese, Dirk P.
Porotsky, Sergey
Rubinstein, Reuven Y.
METHODOLOGY AND COMPUTING IN APPLIED PROBABILITY, 2006, 8 (03) : 383 - 407
[7] The Cross-Entropy Method for Continuous Multi-Extremal Optimization
Dirk P. Kroese
Sergey Porotsky
Reuven Y. Rubinstein
Methodology and Computing in Applied Probability, 2006, 8 : 383 - 407
[8] Cross Entropy Method Meets Local Search for Continuous Optimization Problems
Zhang, Xin
Zhang, Xiu
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2017, 26 (06)
[9] Riemann Integral Based Cross-Entropy for Continuous Function Valued Intuitionistic Fuzzy Sets and an Extended CODAS
M. Ünver
G. Özçeli̇k
Lobachevskii Journal of Mathematics, 2024, 45 (9) : 4404 - 4425
[10] GACE: A meta-heuristic based in the hybridization of Genetic Algorithms and Cross Entropy methods for continuous optimization
Lopez-Garcia, P.
Onieva, E.
Osaba, E.
Masegosa, A. D.
Perallos, A.
EXPERT SYSTEMS WITH APPLICATIONS, 2016, 55 : 508 - 519

← 1 2 →