Policy Evaluation in Decentralized POMDPs With Belief Sharing

被引:0
|
作者
Kayaalp, Mert [1 ]
Ghadieh, Fatima [2 ]
Sayed, Ali H. [1 ]
机构
[1] Ecole Polytech Fed Lausanne EPFL, Adapt Syst Lab, CH-1015 Lausanne, Switzerland
[2] Amer Univ Beirut, Beirut 11072020, Lebanon
来源
关键词
Task analysis; Data models; State estimation; Robot sensing systems; Reinforcement learning; Hidden Markov models; Bayes methods; Belief state; distributed state estimation; multi-agent reinforcement learning; partially observable Markov decision process; value function learning; LEARNING-BEHAVIOR; CONSENSUS; AVERAGE;
D O I
10.1109/OJCSYS.2023.3277760
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most works on multi-agent reinforcement learning focus on scenarios where the state of the environment is fully observable. In this work, we consider a cooperative policy evaluation task in which agents are not assumed to observe the environment state directly. Instead, agents can only have access to noisy observations and to belief vectors. It is well-known that finding global posterior distributions under multi-agent settings is generally NP-hard. As a remedy, we propose a fully decentralized belief forming strategy that relies on individual updates and on localized interactions over a communication network. In addition to the exchange of the beliefs, agents exploit the communication network by exchanging value function parameter estimates as well. We analytically show that the proposed strategy allows information to diffuse over the network, which in turn allows the agents' parameters to have a bounded difference with a centralized baseline. A multi-sensor target tracking application is considered in the simulations.
引用
收藏
页码:125 / 145
页数:21
相关论文
共 50 条
  • [41] Walking Motion Learning of Quadrupedal Walking Robot by Profit Sharing That Can Learn Deterministic Policy for POMDPs Environments
    Morino, Yuya
    Osana, Yuko
    SIMULATED EVOLUTION AND LEARNING (SEAL 2014), 2014, 8886 : 323 - 334
  • [42] Future-Dependent Value-Based Off-Policy Evaluation in POMDPs
    Uehara, Masatoshi
    Kiyohara, Haruka
    Bennett, Andrew
    Chernozhukov, Victor
    Jiang, Nan
    Kallus, Nathan
    Shi, Chengchun
    Sun, Wen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [43] Policy iteration for bounded-parameter POMDPs
    Yaodong Ni
    Zhi-Qiang Liu
    Soft Computing, 2013, 17 : 537 - 548
  • [44] Policy iteration for bounded-parameter POMDPs
    Ni, Yaodong
    Liu, Zhi-Qiang
    SOFT COMPUTING, 2013, 17 (04) : 537 - 548
  • [45] Decentralized Learning of Finite-Memory Policies in Dec-POMDPs
    Mao, Weichao
    Zhang, Kaiqing
    Yang, Zhuoran
    Ba, Tamer Sar
    IFAC PAPERSONLINE, 2023, 56 (02): : 2601 - 2607
  • [46] Search and Explore: Symbiotic Policy Synthesis in POMDPs
    Andriushchenko, Roman
    Bork, Alexander
    Ceska, Milan
    Junges, Sebastian
    Katoen, Joost-Pieter
    Macak, Filip
    COMPUTER AIDED VERIFICATION, CAV 2023, PT III, 2023, 13966 : 113 - 135
  • [47] Factorized Asymptotic Bayesian Policy Search for POMDPs
    Imaizumi, Masaaki
    Fujimaki, Ryohei
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4346 - 4352
  • [48] A Role-based POMDPs Approach for Decentralized Implicit Cooperation of Multiple Agents
    Zhang, Hao
    Chen, Jie
    Fang, Hao
    Dou, Lihua
    2017 13TH IEEE INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2017, : 496 - 501
  • [50] Kernel-Based Decentralized Policy Evaluation for Reinforcement Learning
    Liu, Jiamin
    Lian, Heng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,