Distributed Online Bandit Learning in Dynamic Environments Over Unbalanced Digraphs

被引：17

作者：

Li, Jueyou ^{[1
]}

Li, Chaojie ^{[2
]}

Yu, Wenwu ^{[3
]}

Zhu, Xiaomei ^{[1
]}

Yu, Xinghuo ^{[4
]}

机构：

[1] Chongqing Normal Univ, Sch Math Sci, Chongqing 401331, Peoples R China

[2] Univ New South Wales, Sch Elect Engn & Telecommun, Sydney, NSW 2033, Australia

[3] Southeast Univ, Sch Math, Jiangshu 211189, Peoples R China

[4] RMIT Univ, Sch Engn, Melbourne, Vic 3000, Australia

来源：

IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING | 2021年 / 8卷 / 04期

关键词：

Multi-agent network; Online learning; Distributed optimization; Mirror descent; Unbalanced digraph; STOCHASTIC MIRROR DESCENT; CONVEX-OPTIMIZATION; MULTIAGENT OPTIMIZATION; SUBGRADIENT METHODS;

D O I：

10.1109/TNSE.2021.3093536

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

This work is concerned with distributed online bandit learning over a multi-agent network, where a group of agents aim to seek the minimizer of a time-changing global loss function cooperatively. At each epoch, the global loss function can be represented as the sum of local loss functions known privately by individual agent over the network. Furthermore, local functions are sequentially accessible to agents, and all the agents have no knowledge of future loss functions. Thus, agents of the network must interchange messages to pursue an online estimation of the global loss function. In this paper, we are interested in a bandit setup, where only values of local loss functions at sampling points are disclosed to agents. Meanwhile, we consider a more general network with unbalanced digraphs that the corresponding weight matrix is allowed to be only row stochastic. By extending the celebrated mirror descent algorithm, we first design a distributed bandit online leaning method for the online distributed convex problem. We then establish the sublinear expected dynamic regret attained by the algorithm for convex and strongly convex loss functions, respectively, when the accumulative deviation of the minimizer sequence increases sublinearly. Moreover, the expected dynamic regret bound is analysed for strongly convex loss functions. In addition, the expected static regret bound with the order of O(root T) is obtained in the bandit setting while the corresponding static regret bound with the order of O(ln T) is also provided for the strongly convex case. Finally, numerical examples are provided to illustrate the efficiency of the method and to verify the theoretical findings.

引用

页码：3034 / 3047

页数：14

共 50 条

[1] Constrained distributed online convex optimization with bandit feedback for unbalanced digraphs
Tada, Keishin
Hayashi, Naoki
Takai, Shigemasa
IET CONTROL THEORY AND APPLICATIONS, 2024, 18 (02): : 184 - 200
[2] Distributed Online Learning Algorithm for Noncooperative Games Over Unbalanced Digraphs
Deng, Zhenhua
Zuo, Xiaolong
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) : 15846 - 15856
[3] Distributed Online Learning Algorithm for Noncooperative Games Over Unbalanced Digraphs
Deng, Zhenhua
Zuo, Xiaolong
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) : 15846 - 15856
[4] Online Distributed Constrained Optimization Over General Unbalanced Digraphs
Yang, Qing
Chen, Gang
PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 5671 - 5676
[5] Distributed Online Learning Algorithms for Aggregative Games Over Time-Varying Unbalanced Digraphs
Zuo, Xiaolong
Deng, Zhenhua
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 2278 - 2283
[6] Decentralized Online Bandit Federated Learning Over Unbalanced Directed Networks
Gao, Wang
Zhao, Zhongyuan
Wei, Mengli
Yang, Ju
Zhang, Xiaogang
Li, Jinsong
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2024, 11 (05): : 4264 - 4277
[7] Byzantine-Resilient Distributed Bandit Online Optimization in Dynamic Environments
Wei, Mengli
Yu, Wenwu
Liu, Hongzhe
Chen, Duxin
IEEE Transactions on Industrial Cyber-Physical Systems, 2024, 2 : 154 - 165
[8] A Privacy-Masking Learning Algorithm for Online Distributed Optimization over Time-Varying Unbalanced Digraphs
Hu, Rong
Zhang, Binru
JOURNAL OF MATHEMATICS, 2021, 2021
[9] Distributed Dynamic Online Linear Regression Over Unbalanced Graphs
Cheng, Songsong
Hong, Yiguang
2020 IEEE 16TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2020, : 870 - 875
[10] Distributed online path-length-independent algorithm for noncooperative games over unbalanced digraphs☆
Deng, Zhenhua
AUTOMATICA, 2025, 175

← 1 2 3 4 5 →