Bounded Policy Iteration for Decentralized POMDPs

被引:0
|
作者
Bernstein, Daniel S. [1 ]
Hansen, Eric A.
Zilberstein, Shlomo [1 ]
机构
[1] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this leads to improved performance. The algorithm uses a fixed amount of memory, and each iteration is guaranteed to produce a controller with value at least as high as the previous one for all possible initial state distributions. For the case of a single agent, the algorithm reduces to Poupart and Boutilier's bounded policy iteration for POMDPs.
引用
收藏
页码:1287 / 1292
页数:6
相关论文
共 50 条
  • [41] Point-based value iteration for finite-horizon POMDPs
    Walraven, Erwin
    Spaan, Matthijs T.J.
    Journal of Artificial Intelligence Research, 2019, 65 : 307 - 341
  • [42] Bounded iteration and unary functions
    Mazzanti, S
    MATHEMATICAL LOGIC QUARTERLY, 2005, 51 (01) : 89 - 94
  • [43] Memory-Bounded Dynamic Programming for DEC-POMDPs
    Seuken, Sven
    Zilberstein, Shlomo
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2009 - 2015
  • [44] Multi-agent active information gathering in discrete and continuous-state decentralized POMDPs by policy graph improvement
    Mikko Lauri
    Joni Pajarinen
    Jan Peters
    Autonomous Agents and Multi-Agent Systems, 2020, 34
  • [45] Point-Based Value Iteration for Finite-Horizon POMDPs
    Walraven, Erwin
    Spaan, Matthijs T. J.
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2019, 65 : 307 - 341
  • [46] Decentralized Learning of Finite-Memory Policies in Dec-POMDPs
    Mao, Weichao
    Zhang, Kaiqing
    Yang, Zhuoran
    Ba, Tamer Sar
    IFAC PAPERSONLINE, 2023, 56 (02): : 2601 - 2607
  • [47] Search and Explore: Symbiotic Policy Synthesis in POMDPs
    Andriushchenko, Roman
    Bork, Alexander
    Ceska, Milan
    Junges, Sebastian
    Katoen, Joost-Pieter
    Macak, Filip
    COMPUTER AIDED VERIFICATION, CAV 2023, PT III, 2023, 13966 : 113 - 135
  • [48] Factorized Asymptotic Bayesian Policy Search for POMDPs
    Imaizumi, Masaaki
    Fujimaki, Ryohei
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4346 - 4352
  • [49] Goal-HSVI: Heuristic Search Value Iteration for Goal-POMDPs
    Horak, Karel
    Bosansky, Branislav
    Chatterjee, Krishnendu
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4764 - 4770
  • [50] On policy iteration as a Newton's method and polynomial policy iteration algorithms
    Madani, O
    EIGHTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-02)/FOURTEENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE (IAAI-02), PROCEEDINGS, 2002, : 273 - 278