The Cross-Entropy Method for Policy Search in Decentralized POMDPs

被引:0
|
作者
Oliehoek, Frans A. [1 ]
Kooij, Julian F. P. [1 ]
Vlassis, Nikos [2 ]
机构
[1] Univ Amsterdam, Intelligent Syst Lab, Amsterdam, Netherlands
[2] Tech Univ Crete, Dept Prod Engn & Management, Iraklion, Greece
来源
关键词
multiagent planning; decentralized POMDPs; combinatorial optimization;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Decentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multiagent plan-ning under uncertainty, but solving a Dec-POMDP exactly is known to be an intractable combinatorial op-timization problem. In this paper we apply the Cross-Entropy (CE) method, a recently introduced method for combinatorial optimization, to Dec-POMDPs, resulting in a randomized (sampling-based) algorithm for approximately solving Dec-POMDPs. This algorithm operates by sampling pure policies from an ap-propriately parametrized stochastic policy, and then evaluates these policies either exactly or approximately in order to define the next stochastic policy to sample from, and so on until convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that combinatorial optimization methods can bring leverage to the approximate solution of Dec-POMDPs.
引用
收藏
页码:341 / 357
页数:17
相关论文
共 50 条
  • [1] A Simple Decentralized Cross-Entropy Method
    Zhang, Zichen
    Jin, Jun
    Jagersand, Martin
    Luo, Jun
    Schuurmans, Dale
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] A cross-entropy approach to solving Dec-POMDPs
    Oliehoek, Frans A.
    Kooij, Julian F. P.
    Vlassis, Nikos
    [J]. ADVANCES IN INTELLIGENT AND DISTRIBUTED COMPUTING, 2008, 78 : 145 - +
  • [3] Policy Search with Cross-Entropy Optimization of Basis Functions
    Busoniu, Lucian
    Ernst, Damien
    De Schutter, Bart
    Babuska, Robert
    [J]. ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2009, : 153 - +
  • [4] Combining Soft-Actor Critic with Cross-Entropy Method for Policy Search in Continuous Control
    Hieu Trung Nguyen
    Khang Tran
    Ngoc Hoang Luong
    [J]. 2022 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2022,
  • [5] The Cross-Entropy method for solving a variety of hierarchical search problems
    Simonin, Ceile
    Le Cadre, Jean-Pierre
    Dambreville, Frederic
    [J]. 2007 PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION, VOLS 1-4, 2007, : 1715 - +
  • [6] Combining Deep Deterministic Policy Gradient with Cross-Entropy Method
    Lai, Tung-Yi
    Hsueh, Chu-Hsuan
    Lin, You-Hsuan
    Chu, Yeong-Jia Roger
    Hsueh, Bo-Yang
    Wu, I-Chen
    [J]. 2019 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2019,
  • [7] A tutorial on the cross-entropy method
    De Boer, PT
    Kroese, DP
    Mannor, S
    Rubinstein, RY
    [J]. ANNALS OF OPERATIONS RESEARCH, 2005, 134 (01) : 19 - 67
  • [8] ON THE PERFORMANCE OF THE CROSS-ENTROPY METHOD
    Hu, Jiaqiao
    Hu, Ping
    [J]. PROCEEDINGS OF THE 2009 WINTER SIMULATION CONFERENCE (WSC 2009 ), VOL 1-4, 2009, : 451 - 460
  • [9] On the Convergence of the Cross-Entropy Method
    L. Margolin
    [J]. Annals of Operations Research, 2005, 134 : 201 - 214
  • [10] The Differentiable Cross-Entropy Method
    Amos, Brandon
    Yarats, Denis
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119