A near-optimal poly-time algorithm for learning in a class of stochastic games

被引：0

作者：

Brafman, RI ^{[1
]}

Tennenholtz, M ^{[1
]}

机构：

[1] Ben Gurion Univ Negev, Dept Math & Comp Sci, IL-84105 Beer Sheva, Israel

来源：

IJCAI-99: PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 & 2 | 1999年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a new algorithm for polynomial time learning of near optimal behavior in stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh [1998] in reinforcement learning and of Monderer and Tennenholtz [1997] in repeated games. In stochastic games we face an exploration vs, exploitation dilemma more complex than in Markov decision processes. Namely, given information about particular parts of a game matrix, how much effort should the agent invest in learning its unknown parts. We explain and address these issues within the class of single controller stochastic games. This solution can be extended to stochastic games in general.

引用

页码：734 / 739

页数：6

共 50 条

[1] A near-optimal polynomial time algorithm for learning in certain classes of stochastic games
Brafman, RI
Tennenholtz, M
ARTIFICIAL INTELLIGENCE, 2000, 121 (1-2) : 31 - 47
[2] Near-Optimal No-Regret Learning in General Games
Daskalakis, Constantinos
Fishelson, Maxwell
Golowich, Noah
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[3] Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity
Sidford, Aaron
Wang, Mengdi
Yang, Lin F.
Ye, Yinyu
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108
[4] An Optimal Algorithm for the Stochastic Bandits with Knowing Near-optimal Mean Reward
Yang, Shangdong
Wang, Hao
Gao, Yang
Chen, Xingguo
PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), 2018, : 2130 - 2132
[5] Near-Optimal Reinforcement Learning in Polynomial Time
Michael Kearns
Satinder Singh
Machine Learning, 2002, 49 : 209 - 232
[6] Near-optimal reinforcement learning in polynomial time
Kearns, M
Singh, S
MACHINE LEARNING, 2002, 49 (2-3) : 209 - 232
[7] Near-Optimal Φ-Regret Learning in Extensive-Form Games
Anagnostides, Ioannis
Farina, Gabriele
Sandholm, Tuomas
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202 : 814 - 839
[8] An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward
Yang, Shangdong
Gao, Yang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (05) : 2285 - 2291
[9] Near-Optimal Learning of Extensive-Form Games with Imperfect Information
Bai, Yu
Jin, Chi
Mei, Song
Yu, Tiancheng
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[10] Near-Optimal No-Regret Learning Dynamics for General Convex Games
Farina, Gabriele
Anagnostides, Ioannis
Luo, Haipeng
Lee, Chung-Wei
Kroer, Christian
Sandholm, Tuomas
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,

← 1 2 3 4 5 →