Policy Space Diversity for Non-Transitive Games

被引:0
|
作者
Yao, Jian [1 ]
Liu, Weiming [1 ]
Fu, Haobo [1 ]
Yang, Yaodong [2 ]
McAleer, Stephen [3 ]
Fu, Qiang [1 ]
Yang, Wei [1 ]
机构
[1] Tencent Lab, Shenzhen, Peoples R China
[2] Peking Univ, Beijing, Peoples R China
[3] Carnegie Mellon Univ, Pittsburgh, PA USA
关键词
LEVEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Policy-Space Response Oracles (PSRO) is an influential algorithm framework for approximating a Nash Equilibrium (NE) in multi-agent non-transitive games. Many previous studies have been trying to promote policy diversity in PSRO. A major weakness in existing diversity metrics is that a more diverse (according to their diversity metrics) population does not necessarily mean (as we proved in the paper) a better approximation to a NE. To alleviate this problem, we propose a new diversity metric, the improvement of which guarantees a better approximation to a NE. Meanwhile, we develop a practical and well-justified method to optimize our diversity metric using only state-action samples. By incorporating our diversity regularization into the best response solving in PSRO, we obtain a new PSRO variant, Policy Space Diversity PSRO (PSD-PSRO). We present the convergence property of PSD-PSRO. Empirically, extensive experiments on various games demonstrate that PSD-PSRO is more effective in producing significantly less exploitable policies than state-of-the-art PSRO variants. The experiment code is available at https://github.com/nigelyaoj/policy-space-diversity-psro.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Noisy non-transitive quantum games
    Ramzan, M.
    Khan, Salman
    Khan, M. Khalid
    JOURNAL OF PHYSICS A-MATHEMATICAL AND THEORETICAL, 2010, 43 (26)
  • [2] STRING OVERLAPS, PATTERN-MATCHING, AND NON-TRANSITIVE GAMES
    GUIBAS, LJ
    ODLYZKO, AM
    JOURNAL OF COMBINATORIAL THEORY SERIES A, 1981, 30 (02) : 183 - 208
  • [3] NON-TRANSITIVE REPRESENTATIONS OF TRANSITIVE ORDERINGS
    EBERT, U
    ECONOMICS LETTERS, 1985, 18 (2-3) : 109 - 112
  • [4] NON-TRANSITIVE SMOOTH PREFERENCES
    ALNAJJAR, N
    JOURNAL OF ECONOMIC THEORY, 1993, 60 (01) : 14 - 41
  • [5] A Robust Non-transitive Logic
    Weir, Alan
    TOPOI-AN INTERNATIONAL REVIEW OF PHILOSOPHY, 2015, 34 (01): : 99 - 107
  • [6] Non-transitive looks & fallibilism
    Chuard, Philippe
    PHILOSOPHICAL STUDIES, 2010, 149 (02) : 161 - 200
  • [7] Non-transitive Correspondence Analysis
    Yaroslav Petrukhin
    Vasily Shangin
    Journal of Logic, Language and Information, 2023, 32 : 247 - 273
  • [8] ADDITIVE NON-TRANSITIVE PREFERENCES
    FISHBURN, PC
    ECONOMICS LETTERS, 1990, 34 (04) : 317 - 321
  • [9] Non-transitive connectivity and DHTs
    Freedman, MJ
    Lakshminarayanan, K
    Rhea, S
    Stoica, I
    USENIX Association Proceedings of the Second Workshop on Real, Large Distributed Systems, 2005, : 55 - 60
  • [10] Non-transitive Correspondence Analysis
    Petrukhin, Yaroslav
    Shangin, Vasily
    JOURNAL OF LOGIC LANGUAGE AND INFORMATION, 2023, 32 (02) : 247 - 273