Compression for Multi-Arm Bandits

被引：0

作者：

Hanna O.A. ^{[1
]}

Yang L.F. ^{[1
]}

Fragouli C. ^{[1
]}

机构：

[1] University of California at Los Angeles, Electrical and Computer Engineering Department, Los Angeles, 90095, CA

来源：

IEEE Journal on Selected Areas in Information Theory | 2022年 / 3卷 / 04期

关键词：

communication constraints; compression; contextual bandits; Distributed multi-armed bandits;

D O I：

10.1109/JSAIT.2023.3260770

中图分类号：

TN7 [基本电子电路];

学科分类号：

080902 ;

摘要：

The multi-armed bandit (MAB) problem is one of the most well-known active learning frameworks. The aim is to select the best among a set of actions by sequentially observing rewards that come from an unknown distribution. Recently, a number of distributed bandit applications have become popular over wireless networks, where agents geographically separated from a learner collect and communicate the observed rewards. In this paper we propose a compression scheme, that compresses the rewards collected by the distributed agents. By providing nearly matching upper and lower bounds, we tightly characterize the number of bits needed per reward for the learner to accurately learn without suffering additional regret. In particular, we establish a generic reward quantization algorithm, QuBan , that can be applied on top of any (no-regret) MAB algorithm to form a new communication-efficient counterpart. QuBan requires only a few (converging to as low as 3 bits as the number of iterations increases) bits to be sent per reward while preserving the same regret bound as uncompressed rewards. Our lower bound is established via constructing hard instances from a subGaussian distribution. Our theory is further corroborated by numerical experiments. © 2020 IEEE.

引用

页码：773 / 788

页数：15

共 50 条

[1] Skyline Identification in Multi-Arm Bandits
Cheu, Albert
Sundaram, Ravi
Ullman, Jonathan
[J]. 2018 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2018, : 1006 - 1010
[2] (Nearly) Optimal Differentially Private Stochastic Multi-Arm Bandits
Mishra, Nikita
Thakurta, Abhradeep
[J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2015, : 592 - 601
[3] Efficient Client Selection Based on Contextual Combinatorial Multi-Arm Bandits
Shi, Fang
Lin, Weiwei
Fan, Lisheng
Lai, Xiazhi
Wang, Xiumin
[J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2023, 22 (08) : 5265 - 5277
[4] Implementation of Exploration in TONIC Using Non-stationary Volatile Multi-arm Bandits
Shaha, Aditya
Arya, Dhruv
Tripathy, B. K.
[J]. SOFT COMPUTING FOR PROBLEM SOLVING, SOCPROS 2018, VOL 1, 2020, 1048 : 239 - 250
[5] Multi-arm robots
Buckingham, R
[J]. INDUSTRIAL ROBOT, 1996, 23 (01): : 16 - &
[6] FuzzyBandit: An Autonomous Personalized Model Based on Contextual Multi-Arm Bandits Using Explainable AI
Bansal, Nipun
Bala, Manju
Sharma, Kapil
[J]. DEFENCE SCIENCE JOURNAL, 2024, 74 (04) : 496 - 504
[7] Improved multi-ARM debugging
[J]. Electronic Design, 2002, 50 (21)
[8] Impedance control for multi-arm manipulation
Caccavale, F
Villani, L
[J]. PROCEEDINGS OF THE 39TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 2000, : 3465 - 3470
[9] Manipulability optimization for multi-arm teleoperation
Kennel-Maushart, Florian
Poranne, Roi
Coros, Stelian
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 3956 - 3962
[10] Matching in Multi-arm Bandit with Collision
Zhang, Yirui
Wang, Siwei
Fang, Zhixuan
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,

← 1 2 3 4 5 →