PALO bounds for reinforcement learning in partially observable stochastic games

被引:5
|
作者
Ceren, Roi [1 ]
He, Keyang [1 ]
Doshi, Prashant [1 ]
Banerjee, Bikramjit [2 ]
机构
[1] Univ Georgia, Dept Comp Sci, THINC Lab, Athens, GA 30602 USA
[2] Univ Southern Mississippi, Sch Comp Sci & Comp Engn, Hattiesburg, MS 39406 USA
关键词
Multiagent systems; Reinforcement learning; POMDP; POSG; FRAMEWORK;
D O I
10.1016/j.neucom.2020.08.054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A partially observable stochastic game (POSG) is a general model for multiagent decision making under uncertainty. Perkins' Monte Carlo exploring starts for partially observable Markov decision process (POMDP) (MCES-P) integrates Monte Carlo exploring starts (MCES) into a local search of the policy space to offer an elegant template for model-free reinforcement learning in POSGs. However, multiagent reinforcement learning in POSGs is tremendously more complex than in single agent settings due to the heterogeneity of agents and discrepancy of their goals. In this article, we generalize reinforcement learning under partial observability to self-interested and cooperative multiagent settings under the POSG umbrella. We present three new templates for multiagent reinforcement learning in POSGs. MCES for interactive POMDP (MCESIP) extends MCES-P by maintaining predictions of the other agent's actions based on dynamic beliefs over models. MCES for multiagent POMDP (MCES-MP) generalizes MCES-P to the canonical multiagent POMDP framework, with a single policy mapping joint observations of all agents to joint actions. Finally, MCES for factored-reward multiagent POMDP (MCES-FMP) has each agent individually mapping joint observations to their own action. We use probabilistic approximate locally optimal (PALO) bounds to analyze sample complexity, thereby instantiating these templates to PALO learning. We promote sample efficiency by including a policy space pruning technique and evaluate the approaches on six benchmark domains as well as compare with the state-of-the-art techniques, which demonstrates that MCES-IP and MCES-FMP yield improved policies with fewer samples compared to the previous baselines. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:36 / 56
页数:21
相关论文
共 50 条
  • [1] Sample-Efficient Reinforcement Learning of Partially Observable Markov Games
    Liu, Qinghua
    Szepesvari, Csaba
    Jin, Chi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [2] Multi-task Reinforcement Learning in Partially Observable Stochastic Environments
    Li, Hui
    Liao, Xuejun
    Carin, Lawrence
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2009, 10 : 1131 - 1186
  • [3] Multi-task reinforcement learning in partially observable stochastic environments
    Li, Hui
    Liao, Xuejun
    Carin, Lawrence
    [J]. Journal of Machine Learning Research, 2009, 10 : 1131 - 1186
  • [4] Dynamic programming for partially observable stochastic games
    Hansen, EA
    Bernstein, DS
    Zilberstein, S
    [J]. PROCEEDING OF THE NINETEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE SIXTEENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2004, : 709 - 715
  • [5] Solving Partially Observable Stochastic Games with Public Observations
    Horak, Karel
    Bosansky, Branislav
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 2029 - 2036
  • [6] Safe Policies for Factored Partially Observable Stochastic Games
    Carr, Steven
    Jansen, Nils
    Bharadwaj, Suda
    Spaan, Matthijs T. J.
    Topcu, Ufuk
    [J]. ROBOTICS: SCIENCE AND SYSTEM XVII, 2021,
  • [7] Learning Reward Machines for Partially Observable Reinforcement Learning
    Icarte, Rodrigo Toro
    Waldie, Ethan
    Klassen, Toryn Q.
    Valenzano, Richard
    Castro, Margarita P.
    McIlraith, Sheila A.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [8] Inverse reinforcement learning in partially observable environments
    Choi, Jaedeug
    Kim, Kee-Eung
    [J]. Journal of Machine Learning Research, 2011, 12 : 691 - 730
  • [9] Inverse Reinforcement Learning in Partially Observable Environments
    Choi, Jaedeug
    Kim, Kee-Eung
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2011, 12 : 691 - 730
  • [10] Inverse Reinforcement Learning in Partially Observable Environments
    Choi, Jaedeug
    Kim, Kee-Eung
    [J]. 21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1028 - 1033