Bounded Policy Iteration for Decentralized POMDPs

被引：0

作者：

Bernstein, Daniel S. ^{[1
]}

Hansen, Eric A.

Zilberstein, Shlomo ^{[1
]}

机构：

[1] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA

来源：

19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05) | 2005年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs.

引用

页码：1287 / 1292

页数：6

共 50 条

[31] Decentralized Multi-Robot Cooperation with Auctioned POMDPs
Capitan, Jesus
Spaan, Matthijs T. J.
Merino, Luis
Ollero, Anibal
2012 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2012, : 3323 - 3328
[32] Robust topological policy iteration for infinite horizon bounded Markov Decision Processes
Silva Reis, Willy Arthur
de Barros, Leliane Nunes
Delgado, Karina Valdivia
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2019, 105 : 287 - 304
[33] Point-based Value Iteration for VAR-POMDPs
Zheng, Wei
Lin, Hai
2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 1143 - 1148
[34] Perseus: Randomized point-based value iteration for POMDPs
Spaan, MTJ
Vlassis, N
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2005, 24 : 195 - 220
[35] Monte Carlo Value Iteration for Continuous-State POMDPs
Bai, Haoyu
Hsu, David
Lee, Wee Sun
Ngo, Vien A.
ALGORITHMIC FOUNDATIONS OF ROBOTICS IX, 2010, 68 : 175 - 191
[36] Point-based online value iteration algorithm for POMDPs
Wu, Bo
Wu, Min
She, Jin-Hua
Ruan Jian Xue Bao/Journal of Software, 2013, 24 (01): : 25 - 36
[37] Decentralized Optimal Neurocontroller Design for Mismatched Interconnected Systems via Integral Policy Iteration
Wang, Ding
Fan, Wenqian
Liu, Ao
Qiao, Junfei
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (02) : 687 - 691
[38] Optimal and approximate Q-value functions for decentralized POMDPs
Oliehoek, Frans A.
Spaan, Matthijs T. J.
Vlassis, Nikos
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2008, 32 : 289 - 353
[39] Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement
Lauri, Mikko
Pajarinen, Joni
Peters, Jan
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2020, 34 (02)
[40] Incremental Clustering and Expansion for Faster Optimal Planning in Decentralized POMDPs
Oliehoek, Frans A.
Spaan, Matthijs T. J.
Amato, Christopher
Whiteson, Shimon
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 46 : 449 - 509

← 1 2 3 4 5 →