Cross Entropy Optimization of Action Modification Policies for Continuous-Valued MDPs

被引:0
|
作者
Mirkamali, Kamelia [1 ]
Busoniu, Lucian [2 ]
机构
[1] Khaje Nasir Toosi Univ Technol, Dept Comp Engn, Tehran, Iran
[2] Tech Univ Cluj Napoca, Automat Dept, Cluj Napoca, Romania
来源
IFAC PAPERSONLINE | 2020年 / 53卷 / 02期
关键词
Markov decision processes; policy search; cross-entropy optimization; continuous actions;
D O I
10.1016/j.ifacol.2020.12.2292
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose an algorithm to search for parametrized policies in continuous state and action Markov Decision Processes (MDPs). The policies are represented via a number of basis functions, and the main novelty is that each basis function corresponds to a small, discrete modification of the continuous action. In each state, the policy chooses a discrete action modification associated with a basis function having the maximum value at the current state. Empirical returns from a representative set of initial states are estimated in simulations to evaluate the policies. Instead of using slow gradient-based algorithms, we apply cross entropy method for updating the parameters. The proposed algorithm is applied to a double integrator and an inverted pendulum problem, with encouraging results. Copyright (C) 2020 The Authors.
引用
收藏
页码:8124 / 8129
页数:6
相关论文
共 13 条
  • [1] Continuous-valued map reconstruction with the Bayesian Maximum Entropy
    D'Or, D
    Bogaert, P
    GEODERMA, 2003, 112 (3-4) : 169 - 178
  • [2] Specific Differential Entropy Rate Estimation for Continuous-Valued Time Series
    Darmon, David
    ENTROPY, 2016, 18 (05)
  • [3] Sample-Efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs
    Low, Siow Meng
    Kumar, Akshat
    Sanner, Scott
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9840 - 9848
  • [5] Cross-Entropy Optimization of Control Policies With Adaptive Basis Functions
    Busoniu, Lucian
    Ernst, Damien
    De Schutter, Bart
    Babuska, Robert
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (01): : 196 - 209
  • [6] The cross-entropy method for continuous multi-extremal optimization
    Kroese, Dirk P.
    Porotsky, Sergey
    Rubinstein, Reuven Y.
    METHODOLOGY AND COMPUTING IN APPLIED PROBABILITY, 2006, 8 (03) : 383 - 407
  • [7] The Cross-Entropy Method for Continuous Multi-Extremal Optimization
    Dirk P. Kroese
    Sergey Porotsky
    Reuven Y. Rubinstein
    Methodology and Computing in Applied Probability, 2006, 8 : 383 - 407
  • [8] Cross Entropy Method Meets Local Search for Continuous Optimization Problems
    Zhang, Xin
    Zhang, Xiu
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2017, 26 (06)
  • [9] Riemann Integral Based Cross-Entropy for Continuous Function Valued Intuitionistic Fuzzy Sets and an Extended CODAS
    M. Ünver
    G. Özçeli̇k
    Lobachevskii Journal of Mathematics, 2024, 45 (9) : 4404 - 4425
  • [10] GACE: A meta-heuristic based in the hybridization of Genetic Algorithms and Cross Entropy methods for continuous optimization
    Lopez-Garcia, P.
    Onieva, E.
    Osaba, E.
    Masegosa, A. D.
    Perallos, A.
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 55 : 508 - 519