Gaussian Process Reinforcement Learning for Fast Opportunistic Spectrum Access

被引:7
|
作者
Yan, Zun [1 ]
Cheng, Peng [1 ,2 ]
Chen, Zhuo [3 ]
Li, Yonghui [1 ]
Vucetic, Branka [1 ]
机构
[1] Univ Sydney, Sch Elect & Informat Engn, Sydney, NSW 2006, Australia
[2] La Trobe Univ, Dept Comp Sci & Informat Technol, Melbourne, Vic 3086, Australia
[3] CSIRO DATA61, Marsfield, NSW 2122, Australia
基金
澳大利亚研究理事会;
关键词
Sensors; Correlation; Kernel; Gaussian processes; Learning (artificial intelligence); Radio frequency; Training; Opportunistic spectrum access; sensing policy; Gaussian process reinforcement learning (GPRL); machine learning; COGNITIVE RADIO NETWORKS; OPTIMALITY; DESIGN; BANDIT; MAC;
D O I
10.1109/TSP.2020.2986354
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Opportunistic spectrum access (OSA) is envisioned to support the spectrum demand of future-generation wireless networks. The majority of existing work assumed independent primary channels with the knowledge of network dynamics. However, the channels are usually correlated and network dynamics is unknown <italic>a-priori</italic>. This entails a great challenge on the sensing policy design for spectrum opportunity tracking, and the conventional partially observable Markov decision process (POMDP) formulation with model-based solutions are generally inapplicable. In this paper, we take a different approach, and formulate the sensing policy design as a time-series POMDP from a model-free perspective. To solve this time-series POMDP, we propose a novel Gaussian process reinforcement learning (GPRL) based solution. It achieves accurate channel selection and a fast learning rate. In essence, GP is embedded in RL as a Q-function approximator to efficiently utilize the past learning experience. A novel kernel function is first tailor designed to measure the correlation of time-series spectrum data. Then a covariance-based exploration strategy is developed to enable a proactive exploration for better policy learning. Finally, for GPRL to adapt to multichannel sensing, we propose a novel action-trimming method to reduce the computational cost. Our simulation results show that the designed sensing policy outperforms existing ones, and can obtain a near-optimal performance within a short learning phase.
引用
收藏
页码:2613 / 2628
页数:16
相关论文
共 50 条
  • [1] Gaussian Process Reinforcement Learning for Fast Opportunistic Spectrum Access
    Yan, Zun
    Cheng, Peng
    Chen, Zhuo
    Li, Yonghui
    Vucetic, Branka
    [J]. 2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,
  • [2] Reinforcement learning application scenario for Opportunistic Spectrum Access
    Jouini, Wassim
    Bollenbach, Robin
    Guillet, Matthieu
    Moy, Christophe
    Nafkha, Arnor
    [J]. 2011 IEEE 54TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2011,
  • [3] Reinforcement Learning Approaches and Evaluation Criteria for Opportunistic Spectrum Access
    Robert, Clement
    Moy, Christophe
    Wang, Cheng-Xiang
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2014, : 1508 - 1513
  • [4] Reinforcement Learning for Opportunistic Spectrum Access in Cognitive Radio Networks
    Zhao, Fie
    Qu, Daiming
    Zhong, Guohui
    Cao, Yang
    [J]. 2010 INTERNATIONAL CONFERENCE ON COMMUNICATION AND VEHICULAR TECHNOLOGY (ICCVT 2010), VOL I, 2010, : 116 - 120
  • [5] A Reinforcement-Learning Based Cognitive Scheme for Opportunistic Spectrum Access
    Angeliki V. Kordali
    Panayotis G. Cottis
    [J]. Wireless Personal Communications, 2016, 86 : 751 - 769
  • [6] A Reinforcement-Learning Based Cognitive Scheme for Opportunistic Spectrum Access
    Kordali, Angeliki V.
    Cottis, Panayotis G.
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2016, 86 (02) : 751 - 769
  • [7] Reinforcement Learning Demonstrator for Opportunistic Spectrum Access on Real Radio Signals
    Moy, Christophe
    Nafkha, Amor
    Naoues, Malek
    [J]. 2015 IEEE INTERNATIONAL SYMPOSIUM ON DYNAMIC SPECTRUM ACCESS NETWORKS (DYSPAN), 2015, : 283 - 284
  • [8] Reinforcement Learning with Budget-Constrained Nonparametric Function Approximation for Opportunistic Spectrum Access
    Tsiligkaridis, Theodoros
    Romero, David
    [J]. 2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 579 - 583
  • [9] Efficient Online Learning for Opportunistic Spectrum Access
    Dai, Wenhan
    Gai, Yi
    Krishnamachari, Bhaskar
    [J]. 2012 PROCEEDINGS IEEE INFOCOM, 2012, : 3086 - 3090
  • [10] Inverse Reinforcement Learning with Gaussian Process
    Qiao, Qifeng
    Beling, Peter A.
    [J]. 2011 AMERICAN CONTROL CONFERENCE, 2011, : 113 - 118