Sleeping Multi-Armed Bandit Learning for Fast Uplink Grant Allocation in Machine Type Communications

被引:24
|
作者
Ali, Samad [1 ]
Ferdowsi, Aidin [2 ]
Saad, Walid [2 ,3 ]
Rajatheva, Nandana [1 ]
Haapola, Jussi [1 ]
机构
[1] Univ Oulu, Ctr Wireless Commun CWC, Oulu 90570, Finland
[2] Virginia Tech, Bradley Dept Elect & Comp Engn, Wireless VT, Blacksburg, VA 24061 USA
[3] Kyung Hee Univ, Dept Comp Sci & Engn, Seoul 130701, South Korea
基金
美国国家科学基金会; 芬兰科学院;
关键词
Uplink; Resource management; Prediction algorithms; Delays; Quality of service; Wireless communication; Machine type communications; scheduling; fast uplink grant; multi-armed bandits; Internet of Things; RESOURCE-ALLOCATION; RANDOM-ACCESS; M2M COMMUNICATIONS; LTE; INTERNET;
D O I
10.1109/TCOMM.2020.2989338
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Scheduling fast uplink grant transmissions for machine type communications (MTCs) is one of the main challenges of future wireless systems. In this paper, a novel fast uplink grant scheduling method based on the theory of multi-armed bandits (MABs) is proposed. First, a single quality-of-service metric is defined as a combination of the value of data packets, maximum tolerable access delay, and data rate. Since full knowledge of these metrics for all machine type devices (MTDs) cannot be known in advance at the base station (BS) and the set of active MTDs changes over time, the problem is modeled as a sleeping MAB with stochastic availability and a stochastic reward function. In particular, given that, at each time step, the knowledge on the set of active MTDs is probabilistic, a novel probabilistic sleeping MAB algorithm is proposed to maximize the defined metric. Analysis of the regret is presented and the effect of the prediction error of the source traffic prediction algorithm on the performance of the proposed sleeping MAB algorithm is investigated. Moreover, to enable fast uplink allocation for multiple MTDs at each time, a novel method is proposed based on the concept of best arms ordering in the MAB setting. Simulation results show that the proposed framework yields a three-fold reduction in latency compared to a maximum probability scheduling policy since it prioritizes the scheduling of MTDs that have stricter latency requirements. Moreover, by properly balancing the exploration versus exploitation tradeoff, the proposed algorithm selects the most important MTDs more often by exploitation. During exploration, the sub-optimal MTDs will be selected, which increases the fairness in the system, and, also provides a better estimate of the reward of the sub-optimal MTD.
引用
收藏
页码:5072 / 5086
页数:15
相关论文
共 50 条
  • [21] Burst-induced Multi-Armed Bandit for Learning Recommendation
    Alves, Rodrigo
    Ledent, Antoine
    Kloft, Marius
    15TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS 2021), 2021, : 292 - 301
  • [22] Learning-Based Beamforming for Multi-User Vehicular Communications: A Combinatorial Multi-Armed Bandit Approach
    Nasim, Imtiaz
    Ibrahim, Ahmed S.
    Kim, Seungmo
    IEEE ACCESS, 2020, 8 : 219891 - 219902
  • [23] Multi-Armed Bandit Beam Alignment and Tracking for Mobile Millimeter Wave Communications
    Booth, Matthew B.
    Suresh, Vinayak
    Michelusi, Nicolo
    Love, David J.
    IEEE COMMUNICATIONS LETTERS, 2019, 23 (07) : 1244 - 1248
  • [24] UAV-Assisted Emergency Communications: An Extended Multi-Armed Bandit Perspective
    Lin, Yu
    Wang, Tianyu
    Wang, Shaowei
    IEEE COMMUNICATIONS LETTERS, 2019, 23 (05) : 938 - 941
  • [25] Multi-Agent Multi-Armed Bandit Learning for Grant-Free Access in Ultra-Dense IoT Networks
    Raza, Muhammad Ahmad
    Abolhasan, Mehran
    Lipman, Justin
    Shariati, Negin
    Ni, Wei
    Jamalipour, Abbas
    IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2024, 10 (04) : 1356 - 1370
  • [26] Active Learning on Heterogeneous Information Networks: A Multi-armed Bandit Approach
    Xin, Doris
    El-Kishky, Ahmed
    Liao, De
    Norick, Brandon
    Han, Jiawei
    2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 1350 - 1355
  • [27] Learning State Selection for Reconfigurable Antennas: A Multi-Armed Bandit Approach
    Gulati, Nikhil
    Dandekar, Kapil R.
    IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, 2014, 62 (03) : 1027 - 1038
  • [28] SLA learning from past failures, a Multi-Armed Bandit approach
    Rodier, Lise
    Auger, David
    Cohen, Johanne
    Pouyllau, Helia
    2013 9TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM), 2013, : 277 - 283
  • [29] Learning and Selecting the Right Customers for Reliability: A Multi-armed Bandit Approach
    Li, Yingying
    Hu, Qinran
    Li, Na
    2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 4869 - 4874
  • [30] CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION
    Manickam, Indu
    Lan, Andrew S.
    Baraniuk, Richard G.
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 6344 - 6348