PALO bounds for reinforcement learning in partially observable stochastic games

被引:5
|
作者
Ceren, Roi [1 ]
He, Keyang [1 ]
Doshi, Prashant [1 ]
Banerjee, Bikramjit [2 ]
机构
[1] Univ Georgia, Dept Comp Sci, THINC Lab, Athens, GA 30602 USA
[2] Univ Southern Mississippi, Sch Comp Sci & Comp Engn, Hattiesburg, MS 39406 USA
关键词
Multiagent systems; Reinforcement learning; POMDP; POSG; FRAMEWORK;
D O I
10.1016/j.neucom.2020.08.054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A partially observable stochastic game (POSG) is a general model for multiagent decision making under uncertainty. Perkins' Monte Carlo exploring starts for partially observable Markov decision process (POMDP) (MCES-P) integrates Monte Carlo exploring starts (MCES) into a local search of the policy space to offer an elegant template for model-free reinforcement learning in POSGs. However, multiagent reinforcement learning in POSGs is tremendously more complex than in single agent settings due to the heterogeneity of agents and discrepancy of their goals. In this article, we generalize reinforcement learning under partial observability to self-interested and cooperative multiagent settings under the POSG umbrella. We present three new templates for multiagent reinforcement learning in POSGs. MCES for interactive POMDP (MCESIP) extends MCES-P by maintaining predictions of the other agent's actions based on dynamic beliefs over models. MCES for multiagent POMDP (MCES-MP) generalizes MCES-P to the canonical multiagent POMDP framework, with a single policy mapping joint observations of all agents to joint actions. Finally, MCES for factored-reward multiagent POMDP (MCES-FMP) has each agent individually mapping joint observations to their own action. We use probabilistic approximate locally optimal (PALO) bounds to analyze sample complexity, thereby instantiating these templates to PALO learning. We promote sample efficiency by including a policy space pruning technique and evaluate the approaches on six benchmark domains as well as compare with the state-of-the-art techniques, which demonstrates that MCES-IP and MCES-FMP yield improved policies with fewer samples compared to the previous baselines. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:36 / 56
页数:21
相关论文
共 50 条
  • [31] HSVI Can Solve Zero-Sum Partially Observable Stochastic Games
    Delage, Aurelien
    Buffet, Olivier
    Dibangoye, Jilles S.
    Saffidine, Abdallah
    [J]. DYNAMIC GAMES AND APPLICATIONS, 2024, 14 (04) : 751 - 805
  • [32] A reinforcement learning approach to stochastic business games
    Ravulapati, KK
    Rao, J
    Das, TK
    [J]. IIE TRANSACTIONS, 2004, 36 (04) : 373 - 385
  • [33] Sequential Halving for Partially Observable Games
    Pepels, Tom
    Cazenave, Tristan
    Winands, Mark H. M.
    [J]. COMPUTER GAMES, CGW 2015, 2016, 614 : 16 - 29
  • [34] Partially Observable Games for Secure Autonomy
    Ahmadi, Mohamadreza
    Viswanathan, Arun A.
    Ingham, Michel D.
    Tan, Kymie
    Ames, Aaron D.
    [J]. 2020 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2020), 2020, : 185 - 188
  • [35] Partially Observable Hierarchical Reinforcement Learning with AI Planning (Student Abstract)
    Rozek, Brandon
    Lee, Junkyu
    Kokel, Harsha
    Katz, Michael
    Sohrabi, Shirin
    [J]. THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23635 - 23636
  • [36] Global Linear Convergence of Online Reinforcement Learning for Partially Observable Systems
    Hirai, Takumi
    Sadamoto, Tomonori
    [J]. 2022 EUROPEAN CONTROL CONFERENCE (ECC), 2022, : 1566 - 1571
  • [37] Toward Generalization of Automated Temporal Abstraction to Partially Observable Reinforcement Learning
    Cilden, Erkin
    Polat, Faruk
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (08) : 1414 - 1425
  • [38] Reinforcement learning with augmented states in partially expectation and action observable environment
    Guirnaldo, SA
    Watanabe, K
    Izumi, K
    Kiguchi, K
    [J]. SICE 2002: PROCEEDINGS OF THE 41ST SICE ANNUAL CONFERENCE, VOLS 1-5, 2002, : 823 - 828
  • [39] Reinforcement Learning based on MPC/MHE for Unmodeled and Partially Observable Dynamics
    Esfahani, Hossein Nejatbakhsh
    Kordabad, Arash Bahari
    Gros, Sebastien
    [J]. 2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 2121 - 2126
  • [40] Modeling and reinforcement learning in partially observable many-agent systems
    He, Keyang
    Doshi, Prashant
    Banerjee, Bikramjit
    [J]. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2024, 38 (01)