Efficient Learning in Polyhedral Games via Best-Response Oracles

被引：0

作者：

Chakrabarti, Darshan ^{[1
]}

Farina, Gabriele ^{[2
]}

Kroer, Christian ^{[1
]}

机构：

[1] Columbia Univ, New York, NY 10027 USA

[2] MIT, Cambridge, MA 02139 USA

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9 | 2024年

基金：

美国国家科学基金会;

关键词：

COMPUTATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study online learning and equilibrium computation in games with polyhedral decision sets, a property shared by normal-form games (NFGs) and extensive-form games (EFGs), when the learning agent is restricted to utilizing a best-response oracle. We show how to achieve constant regret in zero-sum games and O(T-1/4) regret in general-sum games while using only O(log t) best-response queries at a given iteration t, thus improving over the best prior result, which required O(T) queries per iteration. Moreover, our framework yields the first last-iterate convergence guarantees for self-play with best-response oracles in zero-sum games. This convergence occurs at a linear rate, though with a condition-number dependence. We go on to show a O(1/root T) best-iterate convergence rate without such a dependence. Our results build on linear-rate convergence results for variants of the Frank-Wolfe (FW) algorithm for strongly convex and smooth minimization problems over polyhedral domains. These FW results depend on a condition number of the polytope, known as facial distance. In order to enable application to settings such as EFGs, we show two broad new results: 1) the facial distance for polytopes of the form {x is an element of R->= 0(n) vertical bar Ax = b} is at least gamma/root k where. is the minimum value of a nonzero coordinate of a vertex in the polytope and k <= n is the number of tight inequality constraints in the optimal face, and 2) the facial distance for polytopes of the form Ax = b, Cx <= d, x >= 0 where x is an element of R-n, C >= 0 is a nonzero integral matrix, and d >= 0, is at least 1/(vertical bar vertical bar C vertical bar vertical bar(infinity)root n). This yields the first such results for several problems, such as sequence-form polytopes, flow polytopes, and matching polytopes.

引用

页码：9564 / 9572

页数：9

共 50 条

[1] Solving Zero-Sum Games Using Best-Response Oracles with Applications to Search Games
Hellerstein, Lisa
Lidbetter, Thomas
Pirutinsky, Daniel
OPERATIONS RESEARCH, 2019, 67 (03) : 731 - 743
[2] Best-response potential games
Voorneveld, M
ECONOMICS LETTERS, 2000, 66 (03) : 289 - 295
[3] Network Learning from Best-Response Dynamics in LQ Games
Chen, Yijun
Ding, Kemi
Shi, Guodong
2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 1680 - 1685
[4] Network Learning in Quadratic Games From Best-Response Dynamics
Ding, Kemi
Chen, Yijun
Wang, Lei
Ren, Xiaoqiang
Shi, Guodong
IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (05) : 3669 - 3684
[5] ON BEST-RESPONSE DYNAMICS IN POTENTIAL GAMES
Swenson, Brian
Murray, Ryan
Kar, Soummya
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2018, 56 (04) : 2734 - 2767
[6] Best-Response Dynamics for Evolutionary Stochastic Games
Murali, Divya
Shaiju, A. J.
INTERNATIONAL GAME THEORY REVIEW, 2023, 25 (04)
[7] Best-Response Cycles in Perfect Information Games
Herings, P. Jean-Jacques
Predtetchinski, Arkadi
MATHEMATICS OF OPERATIONS RESEARCH, 2017, 42 (02) : 427 - 433
[8] Evolutionary games on the lattice: best-response dynamics
Evilsizor, Stephen
Lanchier, Nicolas
ELECTRONIC JOURNAL OF PROBABILITY, 2014, 19
[9] Best-response dynamics in directed network games
Bayer, Peter
Kozics, Gyorgy
Szoke, Nora Gabriella
JOURNAL OF ECONOMIC THEORY, 2023, 213
[10] Active Learning and Best-Response Dynamics
Balcan, Maria-Florina
Berlind, Christopher
Blum, Avrim
Cohen, Emma
Patnaik, Kaushik
Song, Le
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27

← 1 2 3 4 5 →