Scalable solutions of interactive POMDPs using generalized and bounded policy iteration

被引:12
|
作者
Sonu, Ekhlas [1 ]
Doshi, Prashant [1 ]
机构
[1] Univ Georgia, Dept Comp Sci, Athens, GA 30602 USA
基金
美国国家科学基金会;
关键词
Decision making; Multiagent settings; Policy iteration; POMDP; OBSERVABLE MARKOV-PROCESSES; DECENTRALIZED CONTROL; DECISION-MAKING; ALGORITHMS; COMPLEXITY;
D O I
10.1007/s10458-014-9261-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Policy iteration algorithms for partially observable Markov decision processes (POMDPs) offer the benefits of quicker convergence compared to value iteration and the ability to operate directly on the solution, which usually takes the form of a finite state automaton. However, the finite state controller tends to grow quickly in size across iterations due to which its evaluation and improvement become computationally costly. Bounded policy iteration provides a way of keeping the controller size fixed while improving it monotonically until convergence, although it is susceptible to getting trapped in local optima. Despite these limitations, policy iteration algorithms are viable alternatives to value iteration, and allow POMDPs to scale. In this article, we generalize the bounded policy iteration technique to problems involving multiple agents. Specifically, we show how we may perform bounded policy iteration with anytime behavior in settings formalized by the interactive POMDP framework, which generalizes POMDPs to non-stationary contexts shared with multiple other agents. Although policy iteration has been extended to decentralized POMDPs, the context there is strictly cooperative. Its novel generalization in this article makes it useful in non-cooperative settings as well. As interactive POMDPs involve modeling other agents sharing the environment, we ascribe controllers to predict others' actions, with the benefit that the controllers compactly represent the model space. We show how we may exploit the agent's initial belief, often available, toward further improving the controller, particularly in large domains, though at the expense of increased computations, which we compensate. We extensively evaluate the approach on multiple problem domains with some that are significantly large in their dimensions, and in contexts with uncertainty about the other agent's frames and those involving multiple other agents, and demonstrate its properties and scalability.
引用
收藏
页码:455 / 494
页数:40
相关论文
共 50 条
  • [1] Scalable solutions of interactive POMDPs using generalized and bounded policy iteration
    Ekhlas Sonu
    Prashant Doshi
    Autonomous Agents and Multi-Agent Systems, 2015, 29 : 455 - 494
  • [2] Bounded Policy Iteration for Decentralized POMDPs
    Bernstein, Daniel S.
    Hansen, Eric A.
    Zilberstein, Shlomo
    19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 1287 - 1292
  • [3] Policy iteration for bounded-parameter POMDPs
    Yaodong Ni
    Zhi-Qiang Liu
    Soft Computing, 2013, 17 : 537 - 548
  • [4] Policy iteration for bounded-parameter POMDPs
    Ni, Yaodong
    Liu, Zhi-Qiang
    SOFT COMPUTING, 2013, 17 (04) : 537 - 548
  • [5] Point-Based Bounded Policy Iteration for Decentralized POMDPs
    Kim, Youngwook
    Kim, Kee-Eung
    PRICAI 2010: TRENDS IN ARTIFICIAL INTELLIGENCE, 2010, 6230 : 614 - +
  • [6] Privacy-Preserving Policy Iteration for Decentralized POMDPs
    Wu, Feng
    Zilberstein, Shlomo
    Chen, Xiaoping
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 4759 - 4766
  • [7] Graphical models for interactive POMDPs: representations and solutions
    Doshi, Prashant
    Zeng, Yifeng
    Chen, Qiongyu
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2009, 18 (03) : 376 - 416
  • [8] Graphical models for interactive POMDPs: representations and solutions
    Prashant Doshi
    Yifeng Zeng
    Qiongyu Chen
    Autonomous Agents and Multi-Agent Systems, 2009, 18 : 376 - 416
  • [9] Sample-Based Policy Iteration for Constrained DEC-POMDPs
    Wu, Feng
    Jennings, Nicholas R.
    Chen, Xiaoping
    20TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2012), 2012, 242 : 858 - +
  • [10] Bounded Policy Synthesis for POMDPs with Safe-Reachability Objectives
    Wang, Yue
    Chaudhuri, Swarat
    Kavraki, Lydia E.
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), 2018, : 238 - 246