The Cross-Entropy Method for Policy Search in Decentralized POMDPs

被引：0

作者：

Oliehoek, Frans A. ^{[1
]}

Kooij, Julian F. P. ^{[1
]}

Vlassis, Nikos ^{[2
]}

机构：

[1] Univ Amsterdam, Intelligent Syst Lab, Amsterdam, Netherlands

[2] Tech Univ Crete, Dept Prod Engn & Management, Iraklion, Greece

来源：

INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS | 2008年 / 32卷 / 04期

关键词：

multiagent planning; decentralized POMDPs; combinatorial optimization;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Decentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multiagent plan-ning under uncertainty, but solving a Dec-POMDP exactly is known to be an intractable combinatorial op-timization problem. In this paper we apply the Cross-Entropy (CE) method, a recently introduced method for combinatorial optimization, to Dec-POMDPs, resulting in a randomized (sampling-based) algorithm for approximately solving Dec-POMDPs. This algorithm operates by sampling pure policies from an ap-propriately parametrized stochastic policy, and then evaluates these policies either exactly or approximately in order to define the next stochastic policy to sample from, and so on until convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that combinatorial optimization methods can bring leverage to the approximate solution of Dec-POMDPs.

引用

页码：341 / 357

页数：17

共 50 条

[1] A Simple Decentralized Cross-Entropy Method
Zhang, Zichen
Jin, Jun
Jagersand, Martin
Luo, Jun
Schuurmans, Dale
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[2] A cross-entropy approach to solving Dec-POMDPs
Oliehoek, Frans A.
Kooij, Julian F. P.
Vlassis, Nikos
[J]. ADVANCES IN INTELLIGENT AND DISTRIBUTED COMPUTING, 2008, 78 : 145 - +
[3] Policy Search with Cross-Entropy Optimization of Basis Functions
Busoniu, Lucian
Ernst, Damien
De Schutter, Bart
Babuska, Robert
[J]. ADPRL: 2009 IEEE SYMPOSIUM ON ADAPTIVE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2009, : 153 - +
[4] Combining Soft-Actor Critic with Cross-Entropy Method for Policy Search in Continuous Control
Hieu Trung Nguyen
Khang Tran
Ngoc Hoang Luong
[J]. 2022 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2022,
[5] The Cross-Entropy method for solving a variety of hierarchical search problems
Simonin, Ceile
Le Cadre, Jean-Pierre
Dambreville, Frederic
[J]. 2007 PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION, VOLS 1-4, 2007, : 1715 - +
[6] Combining Deep Deterministic Policy Gradient with Cross-Entropy Method
Lai, Tung-Yi
Hsueh, Chu-Hsuan
Lin, You-Hsuan
Chu, Yeong-Jia Roger
Hsueh, Bo-Yang
Wu, I-Chen
[J]. 2019 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2019,
[7] A tutorial on the cross-entropy method
De Boer, PT
Kroese, DP
Mannor, S
Rubinstein, RY
[J]. ANNALS OF OPERATIONS RESEARCH, 2005, 134 (01) : 19 - 67
[8] ON THE PERFORMANCE OF THE CROSS-ENTROPY METHOD
Hu, Jiaqiao
Hu, Ping
[J]. PROCEEDINGS OF THE 2009 WINTER SIMULATION CONFERENCE (WSC 2009 ), VOL 1-4, 2009, : 451 - 460
[9] On the Convergence of the Cross-Entropy Method
L. Margolin
[J]. Annals of Operations Research, 2005, 134 : 201 - 214
[10] The Differentiable Cross-Entropy Method
Amos, Brandon
Yarats, Denis
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119

← 1 2 3 4 5 →