Efficient Learning in Polyhedral Games via Best-Response Oracles

被引：0

作者：

Chakrabarti, Darshan ^{[1
]}

Farina, Gabriele ^{[2
]}

Kroer, Christian ^{[1
]}

机构：

[1] Columbia Univ, New York, NY 10027 USA

[2] MIT, Cambridge, MA 02139 USA

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 9 | 2024年

基金：

美国国家科学基金会;

关键词：

COMPUTATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study online learning and equilibrium computation in games with polyhedral decision sets, a property shared by normal-form games (NFGs) and extensive-form games (EFGs), when the learning agent is restricted to utilizing a best-response oracle. We show how to achieve constant regret in zero-sum games and O(T-1/4) regret in general-sum games while using only O(log t) best-response queries at a given iteration t, thus improving over the best prior result, which required O(T) queries per iteration. Moreover, our framework yields the first last-iterate convergence guarantees for self-play with best-response oracles in zero-sum games. This convergence occurs at a linear rate, though with a condition-number dependence. We go on to show a O(1/root T) best-iterate convergence rate without such a dependence. Our results build on linear-rate convergence results for variants of the Frank-Wolfe (FW) algorithm for strongly convex and smooth minimization problems over polyhedral domains. These FW results depend on a condition number of the polytope, known as facial distance. In order to enable application to settings such as EFGs, we show two broad new results: 1) the facial distance for polytopes of the form {x is an element of R->= 0(n) vertical bar Ax = b} is at least gamma/root k where. is the minimum value of a nonzero coordinate of a vertex in the polytope and k <= n is the number of tight inequality constraints in the optimal face, and 2) the facial distance for polytopes of the form Ax = b, Cx <= d, x >= 0 where x is an element of R-n, C >= 0 is a nonzero integral matrix, and d >= 0, is at least 1/(vertical bar vertical bar C vertical bar vertical bar(infinity)root n). This yields the first such results for several problems, such as sequence-form polytopes, flow polytopes, and matching polytopes.

引用

页码：9564 / 9572

页数：9

共 50 条

[31] Bounded best-response and collective-optimality reasoning in coordination games
Faillo, Marco
Smerilli, Alessandra
Sugden, Robert
JOURNAL OF ECONOMIC BEHAVIOR & ORGANIZATION, 2017, 140 : 317 - 335
[32] Asynchronous Best-Response Dynamics for Resource Allocation Games in Cloud Computing
Schubert, Kevin
Master, Neal
Zhou, Zhengyuan
Bambos, Nicholas
2017 AMERICAN CONTROL CONFERENCE (ACC), 2017, : 4613 - 4618
[33] Learning by replicator and best-response: the importance of being indifferent
Sofia B. S. D. Castro
Journal of Evolutionary Economics, 2018, 28 : 985 - 999
[34] Best-response dynamics, playing sequences, and convergence to equilibrium in random games
Heinrich, Torsten
Jang, Yoojin
Mungo, Luca
Pangallo, Marco
Scott, Alex
Tarbush, Bassel
Wiese, Samuel
INTERNATIONAL JOURNAL OF GAME THEORY, 2023, 52 (03) : 703 - 735
[35] Learning by replicator and best-response: the importance of being indifferent
Castro, Sofia B. S. D.
JOURNAL OF EVOLUTIONARY ECONOMICS, 2018, 28 (04) : 985 - 999
[36] Best-response dynamics in two-person random games with correlated payoffs
Mimun, Hlafo Alfie
Quattropani, Matteo
Scarsini, Marco
GAMES AND ECONOMIC BEHAVIOR, 2024, 145 : 239 - 262
[37] Evolutionary prisoner’s dilemma games with local interaction and best-response dynamics
Yunshyong Chow
Frontiers of Mathematics in China, 2015, 10 : 839 - 856
[38] Evolutionary prisoner's dilemma games with local interaction and best-response dynamics
Chow, Yunshyong
FRONTIERS OF MATHEMATICS IN CHINA, 2015, 10 (04) : 839 - 856
[39] A Randomized Inexact Proximal Best-response Scheme for Potential Stochastic Nash Games
Lei, Jinlong
Shanbhag, Uday V.
2017 IEEE 56TH ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2017,
[40] The Efficiency of Best-Response Dynamics
Feldman, Michal
Snappir, Yuval
Tamir, Tami
ALGORITHMIC GAME THEORY (SAGT 2017), 2017, 10504 : 186 - 198

← 1 2 3 4 5 →