PALO bounds for reinforcement learning in partially observable stochastic games

被引：5

作者：

Ceren, Roi ^{[1
]}

He, Keyang ^{[1
]}

Doshi, Prashant ^{[1
]}

Banerjee, Bikramjit ^{[2
]}

机构：

[1] Univ Georgia, Dept Comp Sci, THINC Lab, Athens, GA 30602 USA

[2] Univ Southern Mississippi, Sch Comp Sci & Comp Engn, Hattiesburg, MS 39406 USA

来源：

NEUROCOMPUTING | 2021年 / 420卷

关键词：

Multiagent systems; Reinforcement learning; POMDP; POSG; FRAMEWORK;

D O I：

10.1016/j.neucom.2020.08.054

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A partially observable stochastic game (POSG) is a general model for multiagent decision making under uncertainty. Perkins' Monte Carlo exploring starts for partially observable Markov decision process (POMDP) (MCES-P) integrates Monte Carlo exploring starts (MCES) into a local search of the policy space to offer an elegant template for model-free reinforcement learning in POSGs. However, multiagent reinforcement learning in POSGs is tremendously more complex than in single agent settings due to the heterogeneity of agents and discrepancy of their goals. In this article, we generalize reinforcement learning under partial observability to self-interested and cooperative multiagent settings under the POSG umbrella. We present three new templates for multiagent reinforcement learning in POSGs. MCES for interactive POMDP (MCESIP) extends MCES-P by maintaining predictions of the other agent's actions based on dynamic beliefs over models. MCES for multiagent POMDP (MCES-MP) generalizes MCES-P to the canonical multiagent POMDP framework, with a single policy mapping joint observations of all agents to joint actions. Finally, MCES for factored-reward multiagent POMDP (MCES-FMP) has each agent individually mapping joint observations to their own action. We use probabilistic approximate locally optimal (PALO) bounds to analyze sample complexity, thereby instantiating these templates to PALO learning. We promote sample efficiency by including a policy space pruning technique and evaluate the approaches on six benchmark domains as well as compare with the state-of-the-art techniques, which demonstrates that MCES-IP and MCES-FMP yield improved policies with fewer samples compared to the previous baselines. (C) 2020 Elsevier B.V. All rights reserved.

引用

页码：36 / 56

页数：21

共 50 条

[41] Partially Observable Reinforcement Learning for Dialog-based Interactive Recommendation
Wu, Yaxiong
Macdonald, Craig
Ounis, Iadh
[J]. 15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021), 2021, : 241 - 251
[42] Collaborative Partially-Observable Reinforcement Learning Using Wireless Communications
Ko, Eisaku
Chen, Kwang-Cheng
Lien, Shao-Yu
[J]. IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
[43] Heuristic Search Value Iteration for One-Sided Partially Observable Stochastic Games
Horak, Karel
Bosansky, Branislav
Pechoucek, Michal
[J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 558 - 564
[44] Solving zero-sum one-sided partially observable stochastic games
Horak, Karel
Bosansky, Branislav
Kovarik, Vojtech
Kiekintveld, Christopher
[J]. ARTIFICIAL INTELLIGENCE, 2023, 316
[45] Optimal Honeypot Allocation Using Core Attack Graph in Partially Observable Stochastic Games
Nguemkam, Achile Leonel
Anwar, Ahmed Hemida
Tchendji, Vianney Kengne
Tosh, Deepak K.
Kamhoua, Charles
[J]. IEEE Access, 2024, 12 : 187444 - 187455
[46] PARTIALLY OBSERVABLE STOCHASTIC OPTIMAL CONTROL
Wang, Guangchen
Xiong, Jie
Zhang, Shuaiqi
[J]. INTERNATIONAL JOURNAL OF NUMERICAL ANALYSIS AND MODELING, 2016, 13 (04) : 493 - 512
[47] Partially observable multistage stochastic programming
Dowson, Oscar
Morton, David P.
Pagnoncelli, Bernardo K.
[J]. OPERATIONS RESEARCH LETTERS, 2020, 48 (04) : 505 - 512
[48] Deep Reinforcement Learning for Partially Observable Data Poisoning Attack in Crowdsensing Systems
Li, Mohan
Sun, Yanbin
Lu, Hui
Maharjan, Sabita
Tian, Zhihong
[J]. IEEE INTERNET OF THINGS JOURNAL, 2020, 7 (07): : 6266 - 6278
[49] A Deep Hierarchical Reinforcement Learning Algorithm in Partially Observable Markov Decision Processes
Le, Tuyen P.
Ngo Anh Vien
Chung, Taechoong
[J]. IEEE ACCESS, 2018, 6 : 49089 - 49102
[50] Fuzzy Reinforcement Learning Control for Decentralized Partially Observable Markov Decision Processes
Sharma, Rajneesh
Spaan, Matthijs T. J.
[J]. IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 1422 - 1429

← 1 2 3 4 5 →