Compression for Multi-Arm Bandits

被引:0
|
作者
Hanna O.A. [1 ]
Yang L.F. [1 ]
Fragouli C. [1 ]
机构
[1] University of California at Los Angeles, Electrical and Computer Engineering Department, Los Angeles, 90095, CA
关键词
communication constraints; compression; contextual bandits; Distributed multi-armed bandits;
D O I
10.1109/JSAIT.2023.3260770
中图分类号
TN7 [基本电子电路];
学科分类号
080902 ;
摘要
The multi-armed bandit (MAB) problem is one of the most well-known active learning frameworks. The aim is to select the best among a set of actions by sequentially observing rewards that come from an unknown distribution. Recently, a number of distributed bandit applications have become popular over wireless networks, where agents geographically separated from a learner collect and communicate the observed rewards. In this paper we propose a compression scheme, that compresses the rewards collected by the distributed agents. By providing nearly matching upper and lower bounds, we tightly characterize the number of bits needed per reward for the learner to accurately learn without suffering additional regret. In particular, we establish a generic reward quantization algorithm, QuBan , that can be applied on top of any (no-regret) MAB algorithm to form a new communication-efficient counterpart. QuBan requires only a few (converging to as low as 3 bits as the number of iterations increases) bits to be sent per reward while preserving the same regret bound as uncompressed rewards. Our lower bound is established via constructing hard instances from a subGaussian distribution. Our theory is further corroborated by numerical experiments. © 2020 IEEE.
引用
收藏
页码:773 / 788
页数:15
相关论文
共 50 条
  • [1] Skyline Identification in Multi-Arm Bandits
    Cheu, Albert
    Sundaram, Ravi
    Ullman, Jonathan
    [J]. 2018 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2018, : 1006 - 1010
  • [2] (Nearly) Optimal Differentially Private Stochastic Multi-Arm Bandits
    Mishra, Nikita
    Thakurta, Abhradeep
    [J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2015, : 592 - 601
  • [3] Efficient Client Selection Based on Contextual Combinatorial Multi-Arm Bandits
    Shi, Fang
    Lin, Weiwei
    Fan, Lisheng
    Lai, Xiazhi
    Wang, Xiumin
    [J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2023, 22 (08) : 5265 - 5277
  • [4] Implementation of Exploration in TONIC Using Non-stationary Volatile Multi-arm Bandits
    Shaha, Aditya
    Arya, Dhruv
    Tripathy, B. K.
    [J]. SOFT COMPUTING FOR PROBLEM SOLVING, SOCPROS 2018, VOL 1, 2020, 1048 : 239 - 250
  • [5] Multi-arm robots
    Buckingham, R
    [J]. INDUSTRIAL ROBOT, 1996, 23 (01): : 16 - &
  • [6] FuzzyBandit: An Autonomous Personalized Model Based on Contextual Multi-Arm Bandits Using Explainable AI
    Bansal, Nipun
    Bala, Manju
    Sharma, Kapil
    [J]. DEFENCE SCIENCE JOURNAL, 2024, 74 (04) : 496 - 504
  • [8] Impedance control for multi-arm manipulation
    Caccavale, F
    Villani, L
    [J]. PROCEEDINGS OF THE 39TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 2000, : 3465 - 3470
  • [9] Manipulability optimization for multi-arm teleoperation
    Kennel-Maushart, Florian
    Poranne, Roi
    Coros, Stelian
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 3956 - 3962
  • [10] Matching in Multi-arm Bandit with Collision
    Zhang, Yirui
    Wang, Siwei
    Fang, Zhixuan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,