Is Learning in Games Good for the Learners?

被引:0
|
作者
Brown, William [1 ]
Schneider, Jon [2 ]
Vodrahalli, Kiran [2 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
[2] Google Res, Mountain View, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider a number of questions related to tradeoffs between reward and regret in repeated gameplay between two agents. To facilitate this, we introduce a notion of generalized equilibrium which allows for asymmetric regret constraints, and yields polytopes of feasible values for each agent and pair of regret constraints, where we show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents. As a central example, we highlight the case one agent is no-swap and the other's regret is unconstrained. We show that this captures an extension of Stackelberg equilibria with a matching optimal value, and that there exists a wide class of games where a player can significantly increase their utility by deviating from a no-swap-regret algorithm against a no-swap learner (in fact, almost any game without pure Nash equilibria is of this form). Additionally, we make use of generalized equilibria to consider tradeoffs in terms of the opponent's algorithm choice. We give a tight characterization for the maximal reward obtainable against some no-regret learner, yet we also show a class of games in which this is bounded away from the value obtainable against the class of common "mean-based" no-regret algorithms. Finally, we consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when the game is initially unknown. Again we show tradeoffs depending on the opponent's learning algorithm: the Stackelberg strategy is learnable in exponential time with any no-regret agent (and in polynomial time with any no-adaptive-regret agent) for any game where it is learnable via queries, and there are games where it is learnable in polynomial time against any no-swap-regret agent but requires exponential time against a mean-based no-regret agent.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] What makes learners a good fit for hybrid learning? Learning competences as predictors of experience and satisfaction in hybrid learning space
    Xiao, Jun
    Sun-Lin, Hong-Zheng
    Lin, Tzu-Han
    Li, Mengyuan
    Pan, Zhimin
    Cheng, Hsu-Chen
    [J]. BRITISH JOURNAL OF EDUCATIONAL TECHNOLOGY, 2020, 51 (04) : 1203 - 1219
  • [42] Understanding Learners' Behaviors in Serious Games
    Muratet, Mathieu
    Yessad, Amel
    Carron, Thibault
    [J]. ADVANCES IN WEB-BASED LEARNING, (ICWL 2016), 2016, 10013 : 195 - 205
  • [43] Designing computer games for adult learners
    Whitton, Nicola
    Crerar, Alison
    [J]. PROCEEDINGS OF THE EUROPEAN CONFERENCE ON GAMES-BASED LEARNIN G, 2007, : 257 - +
  • [44] SIMULATION GAMES ARE FOR YOUNGER LEARNERS TOO
    TURNER, TN
    [J]. SOCIAL STUDIES, 1982, 73 (03): : 130 - 134
  • [45] DESIGNING INSTRUCTIONAL GAMES FOR HANDICAPPED LEARNERS
    THIAGARAJAN, S
    [J]. FOCUS ON EXCEPTIONAL CHILDREN, 1976, 7 (09) : 1 - 11
  • [46] Subgradients of convex games and public good games
    Wang, Yuntong
    [J]. JOURNAL OF CONVEX ANALYSIS, 2007, 14 (01) : 13 - 26
  • [47] Games for learning and learning from games
    Pivec, Maja
    Kearney, Paul
    [J]. Informatica (Ljubljana), 2007, 31 (04) : 419 - 423
  • [48] Games for Learning and Learning from Games
    Pivec, Maja
    Kearney, Paul
    [J]. INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2007, 31 (04): : 419 - 423
  • [49] Learning materials recommendation using good learners’ ratings and content-based filtering
    Khairil Imran Ghauth
    Nor Aniza Abdullah
    [J]. Educational Technology Research and Development, 2010, 58 : 711 - 727
  • [50] Learning materials recommendation using good learners' ratings and content-based filtering
    Ghauth, Khairil Imran
    Abdullah, Nor Aniza
    [J]. ETR&D-EDUCATIONAL TECHNOLOGY RESEARCH AND DEVELOPMENT, 2010, 58 (06): : 711 - 727