Bounded Policy Iteration for Decentralized POMDPs

被引:0
|
作者
Bernstein, Daniel S. [1 ]
Hansen, Eric A.
Zilberstein, Shlomo [1 ]
机构
[1] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs.
引用
收藏
页码:1287 / 1292
页数:6
相关论文
共 50 条
  • [31] Decentralized Multi-Robot Cooperation with Auctioned POMDPs
    Capitan, Jesus
    Spaan, Matthijs T. J.
    Merino, Luis
    Ollero, Anibal
    2012 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2012, : 3323 - 3328
  • [32] Robust topological policy iteration for infinite horizon bounded Markov Decision Processes
    Silva Reis, Willy Arthur
    de Barros, Leliane Nunes
    Delgado, Karina Valdivia
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2019, 105 : 287 - 304
  • [33] Point-based Value Iteration for VAR-POMDPs
    Zheng, Wei
    Lin, Hai
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 1143 - 1148
  • [34] Perseus: Randomized point-based value iteration for POMDPs
    Spaan, MTJ
    Vlassis, N
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2005, 24 : 195 - 220
  • [35] Monte Carlo Value Iteration for Continuous-State POMDPs
    Bai, Haoyu
    Hsu, David
    Lee, Wee Sun
    Ngo, Vien A.
    ALGORITHMIC FOUNDATIONS OF ROBOTICS IX, 2010, 68 : 175 - 191
  • [36] Point-based online value iteration algorithm for POMDPs
    Wu, Bo
    Wu, Min
    She, Jin-Hua
    Ruan Jian Xue Bao/Journal of Software, 2013, 24 (01): : 25 - 36
  • [37] Decentralized Optimal Neurocontroller Design for Mismatched Interconnected Systems via Integral Policy Iteration
    Wang, Ding
    Fan, Wenqian
    Liu, Ao
    Qiao, Junfei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (02) : 687 - 691
  • [38] Optimal and approximate Q-value functions for decentralized POMDPs
    Oliehoek, Frans A.
    Spaan, Matthijs T. J.
    Vlassis, Nikos
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2008, 32 : 289 - 353
  • [39] Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement
    Lauri, Mikko
    Pajarinen, Joni
    Peters, Jan
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2020, 34 (02)
  • [40] Incremental Clustering and Expansion for Faster Optimal Planning in Decentralized POMDPs
    Oliehoek, Frans A.
    Spaan, Matthijs T. J.
    Amato, Christopher
    Whiteson, Shimon
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 46 : 449 - 509