Bounded Policy Iteration for Decentralized POMDPs

被引：0

作者：

Bernstein, Daniel S. ^{[1
]}

Hansen, Eric A.

Zilberstein, Shlomo ^{[1
]}

机构：

[1] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA

来源：

19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05) | 2005年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs.

引用

页码：1287 / 1292

页数：6

共 50 条

[41] Point-based value iteration for finite-horizon POMDPs
Walraven, Erwin
Spaan, Matthijs T.J.
Journal of Artificial Intelligence Research, 2019, 65 : 307 - 341
[42] Bounded iteration and unary functions
Mazzanti, S
MATHEMATICAL LOGIC QUARTERLY, 2005, 51 (01) : 89 - 94
[43] Memory-Bounded Dynamic Programming for DEC-POMDPs
Seuken, Sven
Zilberstein, Shlomo
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2009 - 2015
[44] Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement
Mikko Lauri
Joni Pajarinen
Jan Peters
Autonomous Agents and Multi-Agent Systems, 2020, 34
[45] Point-Based Value Iteration for Finite-Horizon POMDPs
Walraven, Erwin
Spaan, Matthijs T. J.
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2019, 65 : 307 - 341
[46] Decentralized Learning of Finite-Memory Policies in Dec-POMDPs
Mao, Weichao
Zhang, Kaiqing
Yang, Zhuoran
Ba, Tamer Sar
IFAC PAPERSONLINE, 2023, 56 (02): : 2601 - 2607
[47] Search and Explore: Symbiotic Policy Synthesis in POMDPs
Andriushchenko, Roman
Bork, Alexander
Ceska, Milan
Junges, Sebastian
Katoen, Joost-Pieter
Macak, Filip
COMPUTER AIDED VERIFICATION, CAV 2023, PT III, 2023, 13966 : 113 - 135
[48] Factorized Asymptotic Bayesian Policy Search for POMDPs
Imaizumi, Masaaki
Fujimaki, Ryohei
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4346 - 4352
[49] Goal-HSVI: Heuristic Search Value Iteration for Goal-POMDPs
Horak, Karel
Bosansky, Branislav
Chatterjee, Krishnendu
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4764 - 4770
[50] On policy iteration as a Newton's method and polynomial policy iteration algorithms
Madani, O
EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, 2002, : 273 - 278

← 1 2 3 4 5 →