Risk-Averse No-Regret Learning in Online Convex Games

被引:0
|
作者
Wang, Zifan [1 ]
Shen, Yi [2 ]
Zavlanos, Michael M. [2 ]
机构
[1] KTH Royal Inst Technol, Sch Elect Engn & Comp Sci, Stockholm, Sweden
[2] Duke Univ, Dept Mech Engn & Mat Sci, Durham, NC 27708 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider an online stochastic game with risk-averse agents whose goal is to learn optimal decisions that minimize the risk of incurring significantly high costs. Specifically, we use the Conditional Value at Risk (CVaR) as a risk measure that the agents can estimate using bandit feedback in the form of the cost values of only their selected actions. Since the distributions of the cost functions depend on the actions of all agents that are generally unobservable, they are themselves unknown and, therefore, the CVaR values of the costs are difficult to compute. To address this challenge, we propose a new online risk-averse learning algorithm that relies on one-point zerothorder estimation of the CVaR gradients computed using CVaR values that are estimated by appropriately sampling the cost functions. We show that this algorithm achieves sub-linear regret with high probability. We also propose two variants of this algorithm that improve performance. The first variant relies on a new sampling strategy that uses samples from the previous iteration to improve the estimation accuracy of the CVaR values. The second variant employs residual feedback that uses CVaR values from the previous iteration to reduce the variance of the CVaR gradient estimates. We theoretically analyze the convergence properties of these variants and illustrate their performance on an online market problem that we model as a Cournot game.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games
    Wang, Zifan
    Shen, Yi
    Bell, Zachary, I
    Nivison, Scott
    Zavlanos, Michael M.
    Johansson, Karl H.
    [J]. 2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 5179 - 5184
  • [2] No-regret algorithms in on-line learning, games and convex optimization
    Sorin, Sylvain
    [J]. MATHEMATICAL PROGRAMMING, 2024, 203 (1-2) : 645 - 686
  • [3] No-regret algorithms in on-line learning, games and convex optimization
    Sylvain Sorin
    [J]. Mathematical Programming, 2024, 203 : 645 - 686
  • [4] No-Regret Learning in Bayesian Games
    Hartline, Jason
    Syrgkanis, Vasilis
    Tardos, Eva
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [5] NO-REGRET NON-CONVEX ONLINE META-LEARNING
    Zhuang, Zhenxun
    Wang, Yunlong
    Yu, Kezi
    Lu, Songtao
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3942 - 3946
  • [6] Near-Optimal No-Regret Learning Dynamics for General Convex Games
    Farina, Gabriele
    Anagnostides, Ioannis
    Luo, Haipeng
    Lee, Chung-Wei
    Kroer, Christian
    Sandholm, Tuomas
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [7] Decision Variance in Risk-Averse Online Learning
    Vakili, Sattar
    Boukouvalas, Alexis
    Zhao, Qing
    [J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 2738 - 2744
  • [8] Limits and limitations of no-regret learning in games
    Monnot, Barnabe
    Piliouras, Georgios
    [J]. KNOWLEDGE ENGINEERING REVIEW, 2017, 32
  • [9] No-Regret Learning in Dynamic Stackelberg Games
    Lauffer, Niklas
    Ghasemi, Mahsa
    Hashemi, Abolfazl
    Savas, Yagiz
    Topcu, Ufuk
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (03) : 1418 - 1431
  • [10] Regret-Minimization in Risk-Averse Bandits
    Agrawal, Shubhada
    Juneja, Sandeep
    Koolen, Wouter M.
    [J]. 2021 SEVENTH INDIAN CONTROL CONFERENCE (ICC), 2021, : 195 - 200