Scalable solutions of interactive POMDPs using generalized and bounded policy iteration

被引：12

作者：

Sonu, Ekhlas ^{[1
]}

Doshi, Prashant ^{[1
]}

机构：

[1] Univ Georgia, Dept Comp Sci, Athens, GA 30602 USA

来源：

AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS | 2015年 / 29卷 / 03期

基金：

美国国家科学基金会;

关键词：

Decision making; Multiagent settings; Policy iteration; POMDP; OBSERVABLE MARKOV-PROCESSES; DECENTRALIZED CONTROL; DECISION-MAKING; ALGORITHMS; COMPLEXITY;

D O I：

10.1007/s10458-014-9261-5

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Policy iteration algorithms for partially observable Markov decision processes (POMDPs) offer the benefits of quicker convergence compared to value iteration and the ability to operate directly on the solution, which usually takes the form of a finite state automaton. However, the finite state controller tends to grow quickly in size across iterations due to which its evaluation and improvement become computationally costly. Bounded policy iteration provides a way of keeping the controller size fixed while improving it monotonically until convergence, although it is susceptible to getting trapped in local optima. Despite these limitations, policy iteration algorithms are viable alternatives to value iteration, and allow POMDPs to scale. In this article, we generalize the bounded policy iteration technique to problems involving multiple agents. Specifically, we show how we may perform bounded policy iteration with anytime behavior in settings formalized by the interactive POMDP framework, which generalizes POMDPs to non-stationary contexts shared with multiple other agents. Although policy iteration has been extended to decentralized POMDPs, the context there is strictly cooperative. Its novel generalization in this article makes it useful in non-cooperative settings as well. As interactive POMDPs involve modeling other agents sharing the environment, we ascribe controllers to predict others' actions, with the benefit that the controllers compactly represent the model space. We show how we may exploit the agent's initial belief, often available, toward further improving the controller, particularly in large domains, though at the expense of increased computations, which we compensate. We extensively evaluate the approach on multiple problem domains with some that are significantly large in their dimensions, and in contexts with uncertainty about the other agent's frames and those involving multiple other agents, and demonstrate its properties and scalability.

引用

页码：455 / 494

页数：40

共 50 条

[21] Towards Efficient Computation of Error Bounded Solutions in POMDPs: Expected Value Approximation and Dynamic Disjunctive Beliefs
Varakantham, Pradeep
Maheswaran, Rajiv T.
Gupta, Tapana
Tambe, Milind
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2638 - 2643
[22] Improved Planning for Infinite-Horizon Interactive POMDPs Using Probabilistic Inference
Qu, Xia
Doshi, Prashant
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS (AAMAS'15), 2015, : 1839 - 1840
[23] Discrete-Time Nonlinear Generalized Policy Iteration for Optimal Control Using Neural Networks
Wei, Qinglai
Liu, Derong
Yang, Xiong
NEURAL INFORMATION PROCESSING (ICONIP 2014), PT I, 2014, 8834 : 389 - 396
[24] On Generalized Policy Iteration for Continuous-Time Linear Systems
Lee, Jae Young
Chun, Tae Yoon
Park, Jin Bae
Choi, Yoon Ho
2011 50TH IEEE CONFERENCE ON DECISION AND CONTROL AND EUROPEAN CONTROL CONFERENCE (CDC-ECC), 2011, : 1722 - 1728
[25] Memory Bounded Open-Loop Planning in Large POMDPs Using Thompson Sampling
Phan, Thomy
Belzner, Lenz
Kiermeier, Marie
Friedrich, Markus
Schmid, Kyrill
Linnhoff-Popien, Claudia
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 7941 - 7948
[26] Robust topological policy iteration for infinite horizon bounded Markov Decision Processes
Silva Reis, Willy Arthur
de Barros, Leliane Nunes
Delgado, Karina Valdivia
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2019, 105 : 287 - 304
[27] Distributed Policy Iteration for Scalable Approximation of Cooperative Multi-Agent Policies
Phan, Thomy
Schmid, Kyrill
Belzner, Lenz
Gabor, Thomas
Feld, Sebastian
Linnhoff-Popien, Claudia
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2162 - 2164
[28] GLOBAL WEAK SOLUTIONS FOR GENERALIZED SQG IN BOUNDED DOMAINS
Huy Quang Nguyen
ANALYSIS & PDE, 2018, 11 (04): : 1029 - 1047
[29] BOUNDED SOLUTIONS OF A GENERALIZED GOLAB-SCHINZEL EQUATION
Jablonska, Eliza
DEMONSTRATIO MATHEMATICA, 2009, 42 (03) : 533 - 547
[30] Learning Others' Intentional Models in Multi-Agent Settings Using Interactive POMDPs
Han, Yanlin
Gmytrasiewicz, Piotr
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31

← 1 2 3 4 5 →