Distributed Online Bandit Learning in Dynamic Environments Over Unbalanced Digraphs

被引:17
|
作者
Li, Jueyou [1 ]
Li, Chaojie [2 ]
Yu, Wenwu [3 ]
Zhu, Xiaomei [1 ]
Yu, Xinghuo [4 ]
机构
[1] Chongqing Normal Univ, Sch Math Sci, Chongqing 401331, Peoples R China
[2] Univ New South Wales, Sch Elect Engn & Telecommun, Sydney, NSW 2033, Australia
[3] Southeast Univ, Sch Math, Jiangshu 211189, Peoples R China
[4] RMIT Univ, Sch Engn, Melbourne, Vic 3000, Australia
关键词
Multi-agent network; Online learning; Distributed optimization; Mirror descent; Unbalanced digraph; STOCHASTIC MIRROR DESCENT; CONVEX-OPTIMIZATION; MULTIAGENT OPTIMIZATION; SUBGRADIENT METHODS;
D O I
10.1109/TNSE.2021.3093536
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
This work is concerned with distributed online bandit learning over a multi-agent network, where a group of agents aim to seek the minimizer of a time-changing global loss function cooperatively. At each epoch, the global loss function can be represented as the sum of local loss functions known privately by individual agent over the network. Furthermore, local functions are sequentially accessible to agents, and all the agents have no knowledge of future loss functions. Thus, agents of the network must interchange messages to pursue an online estimation of the global loss function. In this paper, we are interested in a bandit setup, where only values of local loss functions at sampling points are disclosed to agents. Meanwhile, we consider a more general network with unbalanced digraphs that the corresponding weight matrix is allowed to be only row stochastic. By extending the celebrated mirror descent algorithm, we first design a distributed bandit online leaning method for the online distributed convex problem. We then establish the sublinear expected dynamic regret attained by the algorithm for convex and strongly convex loss functions, respectively, when the accumulative deviation of the minimizer sequence increases sublinearly. Moreover, the expected dynamic regret bound is analysed for strongly convex loss functions. In addition, the expected static regret bound with the order of O(root T) is obtained in the bandit setting while the corresponding static regret bound with the order of O(ln T) is also provided for the strongly convex case. Finally, numerical examples are provided to illustrate the efficiency of the method and to verify the theoretical findings.
引用
收藏
页码:3034 / 3047
页数:14
相关论文
共 50 条
  • [21] Distributed Inequality Constrained Online Optimization for Unbalanced Digraphs using Row Stochastic Property
    Tada, Keishin
    Hayashi, Naoki
    Takai, Shigemasa
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 2283 - 2288
  • [22] Online Learning of Time-Varying Unbalanced Networks in Non-Convex Environments: A Multi-Armed Bandit Approach
    Odeyomi, Olusola T.
    IEEE ACCESS, 2023, 11 : 15567 - 15577
  • [23] Distributed online constrained nonconvex optimization in dynamic environments over directed graphs
    Suo, Wei
    Li, Wenling
    Liu, Yang
    Song, Jia
    SIGNAL PROCESSING, 2025, 230
  • [24] Distributed algorithm for solving variational inequalities over time-varying unbalanced digraphs
    Zhang, Yichen
    Tang, Yutao
    Tu, Zhipeng
    Hong, Yiguang
    CONTROL THEORY AND TECHNOLOGY, 2024, 22 (03) : 431 - 441
  • [25] Distributed Nash Equilibrium Seeking for Aggregative Games over Weight-Unbalanced Digraphs
    Wang, Dong
    Chen, Pan
    Chen, Mingfei
    Lian, Jie
    Wang, Wei
    IFAC PAPERSONLINE, 2022, 55 (03): : 90 - 95
  • [26] Distributed Convex Optimization with Inequality Constraints over Time-Varying Unbalanced Digraphs
    Xie, Pei
    You, Keyou
    Tempo, Roberto
    Song, Shiji
    Wu, Cheng
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (12) : 4331 - 4337
  • [27] Distributed mirror descent algorithm over unbalanced digraphs based on gradient weighting technique
    Shi, Chong-Xiao
    Yang, Guang-Hong
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2023, 360 (14): : 10656 - 10680
  • [28] Primal-Dual Subgradient Algorithm for Distributed Constraint Optimization Over Unbalanced Digraphs
    Yang, Qing
    Chen, Gang
    IEEE ACCESS, 2019, 7 : 85190 - 85202
  • [29] Differentially private distributed online learning over time-varying digraphs via dual averaging
    Han, Dongyu
    Liu, Kun
    Lin, Yeming
    Xia, Yuanqing
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2022, 32 (05) : 2485 - 2499
  • [30] Differentially private resilient distributed cooperative online estimation over digraphs
    Wang, Jimin
    Zhang, Ji-Feng
    Liu, Xiao-Kang
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2022, 32 (15) : 8670 - 8688