Online convex optimization in the bandit setting: gradient descent without a gradient

被引:0
|
作者
Flaxman, Abraham D. [1 ]
Kalai, Adam Tauman [1 ]
McMahan, H. Brendan [1 ]
机构
[1] Carnegie Mellon Univ, Dept Math Sci, Pittsburgh, PA 15213 USA
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We study it general online convex optimization problem. We have a convex set S and an unknown sequence of cost functions c(1), c(2), ..., and in each period, we choose a feasible point in S, and learn the cost c(t) (x(t)). If the function c(t) is also revealed if each period then, as Zinkevich shows in [25] gradient descent can be used on these functions to get, regret bounds of O(root n). That is, after n rounds, the total cost. incurred will be O(root n) more than the cost, of the best single feasible decision chosen with the benefit of hindsight, min(x) Sigma c(t)(x). We extend this to the "bandit" setting, where, in each period, only the cost c(t)(x(t)) is revealed, and bound the expected regret, as O(n(3/4)). Our approach uses a simple approximation of the gradient that is computed from evaluating c(t) at a single (random) point. We show that this biased estimate is sufficient to approximate gradient descent on the sequence of functions. In other words, it, is possible to use gradient descent without seeing anything more than the value of the functions at, a single point. The guarantees hold even in the most general case: online against an adaptive adversary. For the online linear optimization problem [15], algorithms with low regrets in the bandit setting have recently been given against oblivious [1] and adaptive adversaries [19]. In contrast to these algorithms, which distinguish between explicit explore and exploit periods, our algorithm can be interpreted as doing a small amount of exploration in each period.
引用
收藏
页码:385 / 394
页数:10
相关论文
共 50 条
  • [1] (Bandit) Convex Optimization with Biased Noisy Gradient Oracles
    Hu, Xiaowei
    Prashanth, L. A.
    Gyorgy, Andras
    Szepesvari, Csaba
    [J]. ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 819 - 828
  • [2] Efficient displacement convex optimization with particle gradient descent
    Daneshmand, Hadi
    Lee, Jason D.
    Jin, Chi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [3] Evolutionary Gradient Descent for Non-convex Optimization
    Xue, Ke
    Qian, Chao
    Xu, Ling
    Fei, Xudong
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3221 - 3227
  • [4] ON THE PRIVACY OF NOISY STOCHASTIC GRADIENT DESCENT FOR CONVEX OPTIMIZATION
    Altschuler, Jason M.
    Bok, Jinho
    Talwar, Kunal
    [J]. SIAM JOURNAL ON COMPUTING, 2024, 53 (04) : 969 - 1001
  • [5] Learning to Learn without Gradient Descent by Gradient Descent
    Chen, Yutian
    Hoffman, Matthew W.
    Colmenarejo, Sergio Gomez
    Denil, Misha
    Lillicrap, Timothy P.
    Botvinick, Matt
    de Freitas, Nando
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [6] Online Lazy Gradient Descent is Universal on Strongly Convex Domains
    Anderson, Daron
    Leith, Douglas
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [7] Gradient learning in a classification setting by gradient descent
    Cai, Jia
    Wang, Hongyan
    Zhou, Ding-Xuan
    [J]. JOURNAL OF APPROXIMATION THEORY, 2009, 161 (02) : 674 - 692
  • [8] Adaptive Stochastic Gradient Descent Method for Convex and Non-Convex Optimization
    Chen, Ruijuan
    Tang, Xiaoquan
    Li, Xiuting
    [J]. FRACTAL AND FRACTIONAL, 2022, 6 (12)
  • [9] Stein Variational Gradient Descent Without Gradient
    Han, Jun
    Liu, Qiang
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [10] An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling
    Ding, Qin
    Hsieh, Cho-Jui
    Sharpnack, James
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130