Online convex optimization in the bandit setting: gradient descent without a gradient

被引：0

作者：

Flaxman, Abraham D. ^{[1
]}

Kalai, Adam Tauman ^{[1
]}

McMahan, H. Brendan ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Dept Math Sci, Pittsburgh, PA 15213 USA

来源：

PROCEEDINGS OF THE SIXTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS | 2005年

关键词：

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We study it general online convex optimization problem. We have a convex set S and an unknown sequence of cost functions c(1), c(2), ..., and in each period, we choose a feasible point in S, and learn the cost c(t) (x(t)). If the function c(t) is also revealed if each period then, as Zinkevich shows in [25] gradient descent can be used on these functions to get, regret bounds of O(root n). That is, after n rounds, the total cost. incurred will be O(root n) more than the cost, of the best single feasible decision chosen with the benefit of hindsight, min(x) Sigma c(t)(x). We extend this to the "bandit" setting, where, in each period, only the cost c(t)(x(t)) is revealed, and bound the expected regret, as O(n(3/4)). Our approach uses a simple approximation of the gradient that is computed from evaluating c(t) at a single (random) point. We show that this biased estimate is sufficient to approximate gradient descent on the sequence of functions. In other words, it, is possible to use gradient descent without seeing anything more than the value of the functions at, a single point. The guarantees hold even in the most general case: online against an adaptive adversary. For the online linear optimization problem [15], algorithms with low regrets in the bandit setting have recently been given against oblivious [1] and adaptive adversaries [19]. In contrast to these algorithms, which distinguish between explicit explore and exploit periods, our algorithm can be interpreted as doing a small amount of exploration in each period.

引用

页码：385 / 394

页数：10

共 50 条

[1] (Bandit) Convex Optimization with Biased Noisy Gradient Oracles
Hu, Xiaowei
Prashanth, L. A.
Gyorgy, Andras
Szepesvari, Csaba
[J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 819 - 828
[2] Efficient displacement convex optimization with particle gradient descent
Daneshmand, Hadi
Lee, Jason D.
Jin, Chi
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
[3] Evolutionary Gradient Descent for Non-convex Optimization
Xue, Ke
Qian, Chao
Xu, Ling
Fei, Xudong
[J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3221 - 3227
[4] ON THE PRIVACY OF NOISY STOCHASTIC GRADIENT DESCENT FOR CONVEX OPTIMIZATION
Altschuler, Jason M.
Bok, Jinho
Talwar, Kunal
[J]. SIAM JOURNAL ON COMPUTING, 2024, 53 (04) : 969 - 1001
[5] Learning to Learn without Gradient Descent by Gradient Descent
Chen, Yutian
Hoffman, Matthew W.
Colmenarejo, Sergio Gomez
Denil, Misha
Lillicrap, Timothy P.
Botvinick, Matt
de Freitas, Nando
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[6] Online Lazy Gradient Descent is Universal on Strongly Convex Domains
Anderson, Daron
Leith, Douglas
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[7] Gradient learning in a classification setting by gradient descent
Cai, Jia
Wang, Hongyan
Zhou, Ding-Xuan
[J]. JOURNAL OF APPROXIMATION THEORY, 2009, 161 (02) : 674 - 692
[8] Adaptive Stochastic Gradient Descent Method for Convex and Non-Convex Optimization
Chen, Ruijuan
Tang, Xiaoquan
Li, Xiuting
[J]. FRACTAL AND FRACTIONAL, 2022, 6 (12)
[9] Stein Variational Gradient Descent Without Gradient
Han, Jun
Liu, Qiang
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[10] An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling
Ding, Qin
Hsieh, Cho-Jui
Sharpnack, James
[J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130

← 1 2 3 4 5 →