Distributed Online Bandit Learning in Dynamic Environments Over Unbalanced Digraphs

被引:17
|
作者
Li, Jueyou [1 ]
Li, Chaojie [2 ]
Yu, Wenwu [3 ]
Zhu, Xiaomei [1 ]
Yu, Xinghuo [4 ]
机构
[1] Chongqing Normal Univ, Sch Math Sci, Chongqing 401331, Peoples R China
[2] Univ New South Wales, Sch Elect Engn & Telecommun, Sydney, NSW 2033, Australia
[3] Southeast Univ, Sch Math, Jiangshu 211189, Peoples R China
[4] RMIT Univ, Sch Engn, Melbourne, Vic 3000, Australia
关键词
Multi-agent network; Online learning; Distributed optimization; Mirror descent; Unbalanced digraph; STOCHASTIC MIRROR DESCENT; CONVEX-OPTIMIZATION; MULTIAGENT OPTIMIZATION; SUBGRADIENT METHODS;
D O I
10.1109/TNSE.2021.3093536
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
This work is concerned with distributed online bandit learning over a multi-agent network, where a group of agents aim to seek the minimizer of a time-changing global loss function cooperatively. At each epoch, the global loss function can be represented as the sum of local loss functions known privately by individual agent over the network. Furthermore, local functions are sequentially accessible to agents, and all the agents have no knowledge of future loss functions. Thus, agents of the network must interchange messages to pursue an online estimation of the global loss function. In this paper, we are interested in a bandit setup, where only values of local loss functions at sampling points are disclosed to agents. Meanwhile, we consider a more general network with unbalanced digraphs that the corresponding weight matrix is allowed to be only row stochastic. By extending the celebrated mirror descent algorithm, we first design a distributed bandit online leaning method for the online distributed convex problem. We then establish the sublinear expected dynamic regret attained by the algorithm for convex and strongly convex loss functions, respectively, when the accumulative deviation of the minimizer sequence increases sublinearly. Moreover, the expected dynamic regret bound is analysed for strongly convex loss functions. In addition, the expected static regret bound with the order of O(root T) is obtained in the bandit setting while the corresponding static regret bound with the order of O(ln T) is also provided for the strongly convex case. Finally, numerical examples are provided to illustrate the efficiency of the method and to verify the theoretical findings.
引用
收藏
页码:3034 / 3047
页数:14
相关论文
共 50 条
  • [41] Distributed Optimal Resource Allocation for High-Order Nonlinear Multiagent Systems Over Unbalanced Digraphs
    Zhao, Zeli
    Ding, Jinliang
    Zhang, Jin-Xi
    Shi, Yang
    Chai, Tianyou
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2025, 12 (01): : 51 - 63
  • [42] Multi-Step Subgradient Methods for Distributed Optimization over Unbalanced Digraphs with Local Constraint Sets
    Xiong, Yongyang
    You, Keyou
    Wu, Ligang
    2020 IEEE 16TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2020, : 330 - 335
  • [43] Bandit online optimization over the permutahedron
    Ailon, Nir
    Hatano, Kohei
    Takimoto, Eiji
    THEORETICAL COMPUTER SCIENCE, 2016, 650 : 92 - 108
  • [44] Online Learning Adaptive to Dynamic and Adversarial Environments
    Shen, Yanning
    Chen, Tianyi
    Giannakis, Georgios B.
    2018 IEEE 19TH INTERNATIONAL WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS (SPAWC), 2018, : 351 - 355
  • [45] ONLINE DICTIONARY LEARNING OVER DISTRIBUTED MODELS
    Chen, Jianshu
    Towfic, Zaid J.
    Sayed, Ali H.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [46] Distributed Weight Balancing Over Digraphs
    Rikos, Apostolos I.
    Charalambous, Themistoklis
    Hadjicostis, Christoforos N.
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2014, 1 (02): : 190 - 201
  • [47] Distributed Online Optimization with Coupled Inequality Constraints over Unbalanced Directed Networks
    Wang, Dandan
    Zhu, Daokuan
    Sou, Kin Cheong
    Lu, Jie
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 1162 - 1169
  • [48] Logarithmic Regret for Distributed Online Subgradient Method over Unbalanced Directed Networks
    Yamashita, Makoto
    Hayashi, Naoki
    Hatanaka, Takeshi
    Takai, Shigemasa
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2021, E104A (08) : 1019 - 1026
  • [49] Online distributed optimization algorithm with dynamic regret analysis under unbalanced graphs
    Yao, Songquan
    Xie, Siyu
    Li, Tao
    AUTOMATICA, 2025, 174
  • [50] Bandit Online Learning with Unknown Delays
    Li, Bingcong
    Chen, Tianyi
    Giannakis, Georgios B.
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89