A near-optimal poly-time algorithm for learning in a class of stochastic games

被引:0
|
作者
Brafman, RI [1 ]
Tennenholtz, M [1 ]
机构
[1] Ben Gurion Univ Negev, Dept Math & Comp Sci, IL-84105 Beer Sheva, Israel
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a new algorithm for polynomial time learning of near optimal behavior in stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh [1998] in reinforcement learning and of Monderer and Tennenholtz [1997] in repeated games. In stochastic games we face an exploration vs, exploitation dilemma more complex than in Markov decision processes. Namely, given information about particular parts of a game matrix, how much effort should the agent invest in learning its unknown parts. We explain and address these issues within the class of single controller stochastic games. This solution can be extended to stochastic games in general.
引用
收藏
页码:734 / 739
页数:6
相关论文
共 50 条
  • [1] A near-optimal polynomial time algorithm for learning in certain classes of stochastic games
    Brafman, RI
    Tennenholtz, M
    ARTIFICIAL INTELLIGENCE, 2000, 121 (1-2) : 31 - 47
  • [2] Near-Optimal No-Regret Learning in General Games
    Daskalakis, Constantinos
    Fishelson, Maxwell
    Golowich, Noah
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity
    Sidford, Aaron
    Wang, Mengdi
    Yang, Lin F.
    Ye, Yinyu
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
  • [4] An Optimal Algorithm for the Stochastic Bandits with Knowing Near-optimal Mean Reward
    Yang, Shangdong
    Wang, Hao
    Gao, Yang
    Chen, Xingguo
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), 2018, : 2130 - 2132
  • [5] Near-Optimal Reinforcement Learning in Polynomial Time
    Michael Kearns
    Satinder Singh
    Machine Learning, 2002, 49 : 209 - 232
  • [6] Near-optimal reinforcement learning in polynomial time
    Kearns, M
    Singh, S
    MACHINE LEARNING, 2002, 49 (2-3) : 209 - 232
  • [7] Near-Optimal Φ-Regret Learning in Extensive-Form Games
    Anagnostides, Ioannis
    Farina, Gabriele
    Sandholm, Tuomas
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202 : 814 - 839
  • [8] An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward
    Yang, Shangdong
    Gao, Yang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (05) : 2285 - 2291
  • [9] Near-Optimal Learning of Extensive-Form Games with Imperfect Information
    Bai, Yu
    Jin, Chi
    Mei, Song
    Yu, Tiancheng
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [10] Near-Optimal No-Regret Learning Dynamics for General Convex Games
    Farina, Gabriele
    Anagnostides, Ioannis
    Luo, Haipeng
    Lee, Chung-Wei
    Kroer, Christian
    Sandholm, Tuomas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,