A Bayesian reinforcement learning approach in markov games for computing near-optimal policies

被引:0
|
作者
Julio B. Clempner
机构
[1] Instituto Politécnico Nacional (National Polytechnic Institute),Escuela Superior de Física y Matemáticas (School of Physics and Mathematics
[2] Building 9,undefined
[3] Av. Instituto Politécnico Nacional,undefined
关键词
Reinforcement learning; Bayesian inference; Markov games with private information; Bayesian equilibrium; 91A10; 91A40; 91A26; 62C10; 60J20;
D O I
暂无
中图分类号
学科分类号
摘要
Bayesian Learning is an inference method designed to tackle exploration-exploitation trade-off as a function of the uncertainty of a given probability model from observations within the Reinforcement Learning (RL) paradigm. It allows the incorporation of prior knowledge, as probabilistic distributions, into the algorithms. Finding the resulting Bayes-optimal policies is notorious problem. We focus our attention on RL of a special kind of ergodic and controllable Markov games. We propose a new framework for computing the near-optimal policies for each agent, where it is assumed that the Markov chains are regular and the inverse of the behavior strategy is well defined. A fundamental result of this paper is the development of a theoretical method that, based on the formulation of a non-linear problem, computes the near-optimal adaptive-behavior strategies and policies of the game under some restrictions that maximize the expected reward. We prove that such behavior strategies and the policies satisfy the Bayesian-Nash equilibrium. Another important result is that the RL process learn a model through the interaction of the agents with the environment, and shows how the proposed method can finitely approximate and estimate the elements of the transition matrices and utilities maintaining an efficient long-term learning performance measure. We develop the algorithm for implementing this model. A numerical empirical example shows how to deploy the estimation process as a function of agent experiences.
引用
收藏
页码:675 / 690
页数:15
相关论文
共 50 条
  • [41] A Near-Optimal Algorithm for Computing the Entropy of a Stream
    Chakrabarti, Amit
    Cormode, Graham
    McGregor, Andrew
    PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2007, : 328 - 335
  • [42] Near-Optimal Recursive Identification for Markov Switched Systems
    Andrien, Alex
    Antunes, Duarte J.
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 132 - 138
  • [43] Safe Learning for Near-Optimal Scheduling
    Busatto-Gaston, Damien
    Chakraborty, Debraj
    Guha, Shibashis
    Perez, Guillermo A.
    Raskin, Jean-Francois
    QUANTITATIVE EVALUATION OF SYSTEMS (QEST 2021), 2021, 12846 : 235 - 254
  • [44] Near-Optimal Collaborative Learning in Bandits
    Reda, Clemence
    Vakili, Sattar
    Kaufmann, Emilie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [45] Model-Free Nonstationary Reinforcement Learning: Near-Optimal Regret and Applications in Multiagent Reinforcement Learning and Inventory Control
    Mao, Weichao
    Zhang, Kaiqing
    Zhu, Ruihao
    Simchi-Levi, David
    Basar, Tamer
    MANAGEMENT SCIENCE, 2024,
  • [46] As Safe As It Gets: Near-Optimal Learning in Multi-Stage Games with Imperfect Monitoring
    Kuminov, Danny
    Tennenholtz, Moshe
    ECAI 2008, PROCEEDINGS, 2008, 178 : 438 - +
  • [47] A Hierarchy of Near-Optimal Policies for Multistage Adaptive Optimization
    Bertsimas, Dimitris
    Iancu, Dan Andrei
    Parrilo, Pablo A.
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2011, 56 (12) : 2803 - 2818
  • [48] Faster near-optimal reinforcement learning:: Adding adaptiveness to the E3 algorithm
    Domingo, C
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 1999, 1720 : 241 - 251
  • [49] Response of near-optimal agricultural production to water policies
    Amir, I
    Fisher, FM
    AGRICULTURAL SYSTEMS, 2000, 64 (02) : 115 - 130
  • [50] Robust Near-optimal Control for Constrained Nonlinear System via Integral Reinforcement Learning
    Qiu, Yu-Qing
    Li, Yan
    Wang, Zhong
    INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2023, 21 (04) : 1319 - 1330