Online Convex Optimization With Time-Varying Constraints and Bandit Feedback

被引:61
|
作者
Cao, Xuanyu [1 ]
Liu, K. J. Ray [2 ]
机构
[1] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA
[2] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA
关键词
Bandit feedback; constrained optimization; online convex optimization (OCO); stochastic optimization; ALGORITHMS; REGRET;
D O I
10.1109/TAC.2018.2884653
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, online convex optimization problem with time-varying constraints is studied from the perspective of an agent taking sequential actions. Both the objective function and the constraint functions are dynamic and unknown a priori to the agent. We first consider the scenario of the gradient feedback, in which, the values and gradients of the objective function and constraint functions at the chosen action are revealed after an action is submitted. We propose a computationally efficient online algorithm, which only involves direct closed-form computations at each time instant. It is shown that the algorithm possesses sublinear regret with respect to the dynamic benchmark sequence and sublinear constraint violations, as long as the drift of the benchmark sequence is sublinear, or in other words, the underlying dynamic optimization problems do not vary too drastically. Furthermore, we investigate the scenario of the bandit feedback, in which, after an action is chosen, only the values of the objective function and the constraint functions at several random points close to the action are announced to the agent. A bandit version of the online algorithm is proposed and we also establish its sublinear expected regret and sublinear expected constraint violations under the assumption that the drift of the benchmark sequence is sublinear. Finally, two numerical examples, namely online quadratic programming and online logistic regression, are presented to corroborate the effectiveness of the proposed algorithms and to confirm the theoretical guarantees.
引用
收藏
页码:2665 / 2680
页数:16
相关论文
共 50 条
  • [1] On the Time-Varying Constraints and Bandit Feedback of Online Convex Optimization
    Cao, Xuanyu
    Liu, K. J. Ray
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2018,
  • [2] Online Learning Algorithm for Distributed Convex Optimization With Time-Varying Coupled Constraints and Bandit Feedback
    Li, Jueyou
    Gu, Chuanye
    Wu, Zhiyou
    Huang, Tingwen
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1009 - 1020
  • [3] Distributed Bandit Online Convex Optimization With Time-Varying Coupled Inequality Constraints
    Yi, Xinlei
    Li, Xiuxian
    Yang, Tao
    Xie, Lihua
    Chai, Tianyou
    Johansson, Karl Henrik
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (10) : 4620 - 4635
  • [4] A Distributed Primal-Dual Algorithm for Bandit Online Convex Optimization with Time-Varying Coupled Inequality Constraints
    Yi, Xinlei
    Li, Xiuxian
    Yang, Tao
    Xie, Lihua
    Chai, Tianyou
    Johansson, Karl H.
    [J]. 2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 327 - 332
  • [5] Distributed Online Convex Optimization With Time-Varying Coupled Inequality Constraints
    Yi, Xinlei
    Li, Xiuxian
    Xie, Lihua
    Johansson, Karl H.
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 : 731 - 746
  • [6] A Distributed Algorithm for Online Convex Optimization with Time-Varying Coupled Inequality Constraints
    Yi, Xinlei
    Li, Xiuxian
    Xie, Lihua
    Johansson, Karl H.
    [J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 555 - 560
  • [7] Online Primal-Dual Methods With Measurement Feedback for Time-Varying Convex Optimization
    Bernstein, Andrey
    Dall'Anese, Emiliano
    Simonetto, Andrea
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (08) : 1978 - 1991
  • [8] Distributed online optimization subject to long-term constraints and time-varying topology: An event-triggered and bandit feedback approach
    Zhang, Difeng
    Feng, Zhangcheng
    Xu, Wenying
    Yang, Shaofu
    Cao, Jinde
    [J]. JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2024, 361 (16):
  • [9] Privacy Preserving Distributed Bandit Residual Feedback Online Optimization Over Time-Varying Unbalanced Graphs
    Zhongyuan Zhao
    Zhiqiang Yang
    Luyao Jiang
    Ju Yang
    Quanbo Ge
    [J]. IEEE/CAA Journal of Automatica Sinica., 2024, 11 (11) - 2297
  • [10] Simultaneously achieving sublinear regret and constraint violations for online convex optimization with time-varying constraints
    Liu, Qingsong
    Wu, Wenfei
    Huang, Longbo
    Fang, Zhixuan
    [J]. PERFORMANCE EVALUATION, 2021, 152