Efficient Learning in Polyhedral Games via Best-Response Oracles

被引:0
|
作者
Chakrabarti, Darshan [1 ]
Farina, Gabriele [2 ]
Kroer, Christian [1 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
[2] MIT, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
COMPUTATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study online learning and equilibrium computation in games with polyhedral decision sets, a property shared by normal-form games (NFGs) and extensive-form games (EFGs), when the learning agent is restricted to utilizing a best-response oracle. We show how to achieve constant regret in zero-sum games and O(T-1/4) regret in general-sum games while using only O(log t) best-response queries at a given iteration t, thus improving over the best prior result, which required O(T) queries per iteration. Moreover, our framework yields the first last-iterate convergence guarantees for self-play with best-response oracles in zero-sum games. This convergence occurs at a linear rate, though with a condition-number dependence. We go on to show a O(1/root T) best-iterate convergence rate without such a dependence. Our results build on linear-rate convergence results for variants of the Frank-Wolfe (FW) algorithm for strongly convex and smooth minimization problems over polyhedral domains. These FW results depend on a condition number of the polytope, known as facial distance. In order to enable application to settings such as EFGs, we show two broad new results: 1) the facial distance for polytopes of the form {x is an element of R->= 0(n) vertical bar Ax = b} is at least gamma/root k where. is the minimum value of a nonzero coordinate of a vertex in the polytope and k <= n is the number of tight inequality constraints in the optimal face, and 2) the facial distance for polytopes of the form Ax = b, Cx <= d, x >= 0 where x is an element of R-n, C >= 0 is a nonzero integral matrix, and d >= 0, is at least 1/(vertical bar vertical bar C vertical bar vertical bar(infinity)root n). This yields the first such results for several problems, such as sequence-form polytopes, flow polytopes, and matching polytopes.
引用
收藏
页码:9564 / 9572
页数:9
相关论文
共 50 条
  • [1] Solving Zero-Sum Games Using Best-Response Oracles with Applications to Search Games
    Hellerstein, Lisa
    Lidbetter, Thomas
    Pirutinsky, Daniel
    OPERATIONS RESEARCH, 2019, 67 (03) : 731 - 743
  • [2] Best-response potential games
    Voorneveld, M
    ECONOMICS LETTERS, 2000, 66 (03) : 289 - 295
  • [3] Network Learning from Best-Response Dynamics in LQ Games
    Chen, Yijun
    Ding, Kemi
    Shi, Guodong
    2023 AMERICAN CONTROL CONFERENCE, ACC, 2023, : 1680 - 1685
  • [4] Network Learning in Quadratic Games From Best-Response Dynamics
    Ding, Kemi
    Chen, Yijun
    Wang, Lei
    Ren, Xiaoqiang
    Shi, Guodong
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (05) : 3669 - 3684
  • [5] ON BEST-RESPONSE DYNAMICS IN POTENTIAL GAMES
    Swenson, Brian
    Murray, Ryan
    Kar, Soummya
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2018, 56 (04) : 2734 - 2767
  • [6] Best-Response Dynamics for Evolutionary Stochastic Games
    Murali, Divya
    Shaiju, A. J.
    INTERNATIONAL GAME THEORY REVIEW, 2023, 25 (04)
  • [7] Best-Response Cycles in Perfect Information Games
    Herings, P. Jean-Jacques
    Predtetchinski, Arkadi
    MATHEMATICS OF OPERATIONS RESEARCH, 2017, 42 (02) : 427 - 433
  • [8] Evolutionary games on the lattice: best-response dynamics
    Evilsizor, Stephen
    Lanchier, Nicolas
    ELECTRONIC JOURNAL OF PROBABILITY, 2014, 19
  • [9] Best-response dynamics in directed network games 
    Bayer, Peter
    Kozics, Gyorgy
    Szoke, Nora Gabriella
    JOURNAL OF ECONOMIC THEORY, 2023, 213
  • [10] Active Learning and Best-Response Dynamics
    Balcan, Maria-Florina
    Berlind, Christopher
    Blum, Avrim
    Cohen, Emma
    Patnaik, Kaushik
    Song, Le
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27