Whittle index based Q-learning for restless bandits with average reward

被引:16
|
作者
Avrachenkov, Konstantin E. [1 ]
Borkar, Vivek S. [2 ]
机构
[1] Inria Sophia Antipolis, F-06902 Valbonne, France
[2] Indian Inst Technol, Dept Elect Engn, Bombay 400076, India
关键词
Discrete event system; Reinforcement learning; Restless bandits; Whittle index; Q-learning; Average reward; STOCHASTIC-APPROXIMATION; POLICY; INDEXABILITY; ALGORITHM;
D O I
10.1016/j.automatica.2022.110186
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme. (c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Towards Q-learning the Whittle Index for Restless Bandits
    Fu, Jing
    Nazarathy, Yoni
    Moka, Sarat
    Taylor, Peter G.
    [J]. 2019 AUSTRALIAN & NEW ZEALAND CONTROL CONFERENCE (ANZCC), 2019, : 249 - 254
  • [2] On Learning Whittle Index Policy for Restless Bandits With Scalable Regret
    Akbarzadeh, Nima
    Mahajan, Aditya
    [J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2024, 11 (03): : 1190 - 1202
  • [3] Optimistic Whittle Index Policy: Online Learning for Restless Bandits
    Wang, Kai
    Xu, Lily
    Taneja, Aparna
    Tambe, Milind
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 10131 - 10139
  • [4] QWI: Q-learning with Whittle Index
    Robledo F.
    Borkar V.
    Ayesta U.
    Avrachenkov K.
    [J]. Performance Evaluation Review, 2021, 49 (02): : 47 - 50
  • [5] Finite-Time Analysis of Whittle Index based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation
    Xiong, Guojun
    Li, Jian
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] On the Whittle index of Markov modulated restless bandits
    Duran, S.
    Ayesta, U.
    Verloop, I. M.
    [J]. QUEUEING SYSTEMS, 2022, 102 (3-4) : 373 - 430
  • [7] On the Whittle index of Markov modulated restless bandits
    S. Duran
    U. Ayesta
    I. M. Verloop
    [J]. Queueing Systems, 2022, 102 : 373 - 430
  • [8] On the Whittle Index for Restless Multiarmed Hidden Markov Bandits
    Meshram, Rahul
    Manjunath, D.
    Gopalan, Aditya
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (09) : 3046 - 3053
  • [9] On the computation of Whittle’s index for Markovian restless bandits
    Urtzi Ayesta
    Manu K. Gupta
    Ina Maria Verloop
    [J]. Mathematical Methods of Operations Research, 2021, 93 : 179 - 208
  • [10] On the computation of Whittle's index for Markovian restless bandits
    Ayesta, Urtzi
    Gupta, Manu K.
    Verloop, Ina Maria
    [J]. MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2021, 93 (01) : 179 - 208