Policy Evaluation in Decentralized POMDPs With Belief Sharing

被引：0

作者：

Kayaalp, Mert ^{[1
]}

Ghadieh, Fatima ^{[2
]}

Sayed, Ali H. ^{[1
]}

机构：

[1] Ecole Polytech Fed Lausanne EPFL, Adapt Syst Lab, CH-1015 Lausanne, Switzerland

[2] Amer Univ Beirut, Beirut 11072020, Lebanon

来源：

IEEE OPEN JOURNAL OF CONTROL SYSTEMS | 2023年 / 2卷

关键词：

Task analysis; Data models; State estimation; Robot sensing systems; Reinforcement learning; Hidden Markov models; Bayes methods; Belief state; distributed state estimation; multi-agent reinforcement learning; partially observable Markov decision process; value function learning; LEARNING-BEHAVIOR; CONSENSUS; AVERAGE;

D O I：

10.1109/OJCSYS.2023.3277760

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Most works on multi-agent reinforcement learning focus on scenarios where the state of the environment is fully observable. In this work, we consider a cooperative policy evaluation task in which agents are not assumed to observe the environment state directly. Instead, agents can only have access to noisy observations and to belief vectors. It is well-known that finding global posterior distributions under multi-agent settings is generally NP-hard. As a remedy, we propose a fully decentralized belief forming strategy that relies on individual updates and on localized interactions over a communication network. In addition to the exchange of the beliefs, agents exploit the communication network by exchanging value function parameter estimates as well. We analytically show that the proposed strategy allows information to diffuse over the network, which in turn allows the agents' parameters to have a bounded difference with a centralized baseline. A multi-sensor target tracking application is considered in the simulations.

引用

页码：125 / 145

页数：21

共 50 条

[41] Walking Motion Learning of Quadrupedal Walking Robot by Profit Sharing That Can Learn Deterministic Policy for POMDPs Environments
Morino, Yuya
Osana, Yuko
SIMULATED EVOLUTION AND LEARNING (SEAL 2014), 2014, 8886 : 323 - 334
[42] Future-Dependent Value-Based Off-Policy Evaluation in POMDPs
Uehara, Masatoshi
Kiyohara, Haruka
Bennett, Andrew
Chernozhukov, Victor
Jiang, Nan
Kallus, Nathan
Shi, Chengchun
Sun, Wen
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[43] Policy iteration for bounded-parameter POMDPs
Yaodong Ni
Zhi-Qiang Liu
Soft Computing, 2013, 17 : 537 - 548
[44] Policy iteration for bounded-parameter POMDPs
Ni, Yaodong
Liu, Zhi-Qiang
SOFT COMPUTING, 2013, 17 (04) : 537 - 548
[45] Decentralized Learning of Finite-Memory Policies in Dec-POMDPs
Mao, Weichao
Zhang, Kaiqing
Yang, Zhuoran
Ba, Tamer Sar
IFAC PAPERSONLINE, 2023, 56 (02): : 2601 - 2607
[46] Search and Explore: Symbiotic Policy Synthesis in POMDPs
Andriushchenko, Roman
Bork, Alexander
Ceska, Milan
Junges, Sebastian
Katoen, Joost-Pieter
Macak, Filip
COMPUTER AIDED VERIFICATION, CAV 2023, PT III, 2023, 13966 : 113 - 135
[47] Factorized Asymptotic Bayesian Policy Search for POMDPs
Imaizumi, Masaaki
Fujimaki, Ryohei
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4346 - 4352
[48] A Role-based POMDPs Approach for Decentralized Implicit Cooperation of Multiple Agents
Zhang, Hao
Chen, Jie
Fang, Hao
Dou, Lihua
2017 13TH IEEE INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2017, : 496 - 501
[49] INTERACTIONS AMONG EVALUATION, PROGRAM IMPLEMENTATION, AND POLICY IN A DECENTRALIZED SYSTEM
HEDRICK, TE
POLICY STUDIES JOURNAL, 1980, 8 : 1203 - 1212
[50] Kernel-Based Decentralized Policy Evaluation for Reinforcement Learning
Liu, Jiamin
Lian, Heng
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,

← 1 2 3 4 5 →