Whittle index based Q-learning for restless bandits with average reward

被引：16

作者：

Avrachenkov, Konstantin E. ^{[1
]}

Borkar, Vivek S. ^{[2
]}

机构：

[1] Inria Sophia Antipolis, F-06902 Valbonne, France

[2] Indian Inst Technol, Dept Elect Engn, Bombay 400076, India

来源：

AUTOMATICA | 2022年 / 139卷

关键词：

Discrete event system; Reinforcement learning; Restless bandits; Whittle index; Q-learning; Average reward; STOCHASTIC-APPROXIMATION; POLICY; INDEXABILITY; ALGORITHM;

D O I：

10.1016/j.automatica.2022.110186

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme. (c) 2022 Elsevier Ltd. All rights reserved.

引用

页数：10

共 50 条

[1] Towards Q-learning the Whittle Index for Restless Bandits
Fu, Jing
Nazarathy, Yoni
Moka, Sarat
Taylor, Peter G.
[J]. 2019 AUSTRALIAN & NEW ZEALAND CONTROL CONFERENCE (ANZCC), 2019, : 249 - 254
[2] On Learning Whittle Index Policy for Restless Bandits With Scalable Regret
Akbarzadeh, Nima
Mahajan, Aditya
[J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2024, 11 (03): : 1190 - 1202
[3] Optimistic Whittle Index Policy: Online Learning for Restless Bandits
Wang, Kai
Xu, Lily
Taneja, Aparna
Tambe, Milind
[J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 10131 - 10139
[4] QWI: Q-learning with Whittle Index
Robledo F.
Borkar V.
Ayesta U.
Avrachenkov K.
[J]. Performance Evaluation Review, 2021, 49 (02): : 47 - 50
[5] Finite-Time Analysis of Whittle Index based Q-Learning for Restless Multi-Armed Bandits with Neural Network Function Approximation
Xiong, Guojun
Li, Jian
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[6] On the Whittle index of Markov modulated restless bandits
Duran, S.
Ayesta, U.
Verloop, I. M.
[J]. QUEUEING SYSTEMS, 2022, 102 (3-4) : 373 - 430
[7] On the Whittle index of Markov modulated restless bandits
S. Duran
U. Ayesta
I. M. Verloop
[J]. Queueing Systems, 2022, 102 : 373 - 430
[8] On the Whittle Index for Restless Multiarmed Hidden Markov Bandits
Meshram, Rahul
Manjunath, D.
Gopalan, Aditya
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (09) : 3046 - 3053
[9] On the computation of Whittle’s index for Markovian restless bandits
Urtzi Ayesta
Manu K. Gupta
Ina Maria Verloop
[J]. Mathematical Methods of Operations Research, 2021, 93 : 179 - 208
[10] On the computation of Whittle's index for Markovian restless bandits
Ayesta, Urtzi
Gupta, Manu K.
Verloop, Ina Maria
[J]. MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2021, 93 (01) : 179 - 208

← 1 2 3 4 5 →