Risk-Averse No-Regret Learning in Online Convex Games

被引：0

作者：

Wang, Zifan ^{[1
]}

Shen, Yi ^{[2
]}

Zavlanos, Michael M. ^{[2
]}

机构：

[1] KTH Royal Inst Technol, Sch Elect Engn & Comp Sci, Stockholm, Sweden

[2] Duke Univ, Dept Mech Engn & Mat Sci, Durham, NC 27708 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider an online stochastic game with risk-averse agents whose goal is to learn optimal decisions that minimize the risk of incurring significantly high costs. Specifically, we use the Conditional Value at Risk (CVaR) as a risk measure that the agents can estimate using bandit feedback in the form of the cost values of only their selected actions. Since the distributions of the cost functions depend on the actions of all agents that are generally unobservable, they are themselves unknown and, therefore, the CVaR values of the costs are difficult to compute. To address this challenge, we propose a new online risk-averse learning algorithm that relies on one-point zerothorder estimation of the CVaR gradients computed using CVaR values that are estimated by appropriately sampling the cost functions. We show that this algorithm achieves sub-linear regret with high probability. We also propose two variants of this algorithm that improve performance. The first variant relies on a new sampling strategy that uses samples from the previous iteration to improve the estimation accuracy of the CVaR values. The second variant employs residual feedback that uses CVaR values from the previous iteration to reduce the variance of the CVaR gradient estimates. We theoretically analyze the convergence properties of these variants and illustrate their performance on an online market problem that we model as a Cournot game.

引用

页数：19

共 50 条

[1] A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games
Wang, Zifan
Shen, Yi
Bell, Zachary, I
Nivison, Scott
Zavlanos, Michael M.
Johansson, Karl H.
[J]. 2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 5179 - 5184
[2] No-regret algorithms in on-line learning, games and convex optimization
Sorin, Sylvain
[J]. MATHEMATICAL PROGRAMMING, 2024, 203 (1-2) : 645 - 686
[3] No-regret algorithms in on-line learning, games and convex optimization
Sylvain Sorin
[J]. Mathematical Programming, 2024, 203 : 645 - 686
[4] No-Regret Learning in Bayesian Games
Hartline, Jason
Syrgkanis, Vasilis
Tardos, Eva
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[5] NO-REGRET NON-CONVEX ONLINE META-LEARNING
Zhuang, Zhenxun
Wang, Yunlong
Yu, Kezi
Lu, Songtao
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3942 - 3946
[6] Near-Optimal No-Regret Learning Dynamics for General Convex Games
Farina, Gabriele
Anagnostides, Ioannis
Luo, Haipeng
Lee, Chung-Wei
Kroer, Christian
Sandholm, Tuomas
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[7] Decision Variance in Risk-Averse Online Learning
Vakili, Sattar
Boukouvalas, Alexis
Zhao, Qing
[J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 2738 - 2744
[8] Limits and limitations of no-regret learning in games
Monnot, Barnabe
Piliouras, Georgios
[J]. KNOWLEDGE ENGINEERING REVIEW, 2017, 32
[9] No-Regret Learning in Dynamic Stackelberg Games
Lauffer, Niklas
Ghasemi, Mahsa
Hashemi, Abolfazl
Savas, Yagiz
Topcu, Ufuk
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (03) : 1418 - 1431
[10] Regret-Minimization in Risk-Averse Bandits
Agrawal, Shubhada
Juneja, Sandeep
Koolen, Wouter M.
[J]. 2021 SEVENTH INDIAN CONTROL CONFERENCE (ICC), 2021, : 195 - 200

← 1 2 3 4 5 →