Gaussian Process Reinforcement Learning for Fast Opportunistic Spectrum Access

被引：7

作者：

Yan, Zun ^{[1
]}

Cheng, Peng ^{[1
,2
]}

Chen, Zhuo ^{[3
]}

Li, Yonghui ^{[1
]}

Vucetic, Branka ^{[1
]}

机构：

[1] Univ Sydney, Sch Elect & Informat Engn, Sydney, NSW 2006, Australia

[2] La Trobe Univ, Dept Comp Sci & Informat Technol, Melbourne, Vic 3086, Australia

[3] CSIRO DATA61, Marsfield, NSW 2122, Australia

来源：

IEEE TRANSACTIONS ON SIGNAL PROCESSING | 2020年 / 68卷

基金：

澳大利亚研究理事会;

关键词：

Sensors; Correlation; Kernel; Gaussian processes; Learning (artificial intelligence); Radio frequency; Training; Opportunistic spectrum access; sensing policy; Gaussian process reinforcement learning (GPRL); machine learning; COGNITIVE RADIO NETWORKS; OPTIMALITY; DESIGN; BANDIT; MAC;

D O I：

10.1109/TSP.2020.2986354

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Opportunistic spectrum access (OSA) is envisioned to support the spectrum demand of future-generation wireless networks. The majority of existing work assumed independent primary channels with the knowledge of network dynamics. However, the channels are usually correlated and network dynamics is unknown <italic>a-priori</italic>. This entails a great challenge on the sensing policy design for spectrum opportunity tracking, and the conventional partially observable Markov decision process (POMDP) formulation with model-based solutions are generally inapplicable. In this paper, we take a different approach, and formulate the sensing policy design as a time-series POMDP from a model-free perspective. To solve this time-series POMDP, we propose a novel Gaussian process reinforcement learning (GPRL) based solution. It achieves accurate channel selection and a fast learning rate. In essence, GP is embedded in RL as a Q-function approximator to efficiently utilize the past learning experience. A novel kernel function is first tailor designed to measure the correlation of time-series spectrum data. Then a covariance-based exploration strategy is developed to enable a proactive exploration for better policy learning. Finally, for GPRL to adapt to multichannel sensing, we propose a novel action-trimming method to reduce the computational cost. Our simulation results show that the designed sensing policy outperforms existing ones, and can obtain a near-optimal performance within a short learning phase.

引用

页码：2613 / 2628

页数：16

共 50 条

[11] Approximately Optimal Adaptive Learning in Opportunistic Spectrum Access
Tekin, Cem
Liu, Mingyan
[J]. 2012 PROCEEDINGS IEEE INFOCOM, 2012, : 1548 - 1556
[12] Decentralized Online Learning Algorithms for Opportunistic Spectrum Access
Gai, Yi
Krishnamachari, Bhaskar
[J]. 2011 IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE (GLOBECOM 2011), 2011,
[13] Fast Learning Cognitive Radios in Underlay Dynamic Spectrum Access: Integration of Transfer Learning into Deep Reinforcement Learning
Shah-Mohammadi, Fatemeh
Kwasinski, Andres
[J]. 2020 WIRELESS TELECOMMUNICATIONS SYMPOSIUM (WTS), 2020,
[14] Gaussian process model based reinforcement learning
Yoo, Jae Hyun
[J]. Journal of Institute of Control, Robotics and Systems, 2019, 25 (08) : 746 - 751
[15] Fast Reinforcement Learning with Incremental Gaussian Mixture Models
Pinto, Rafael
[J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[16] Opportunistic Spectrum Access with Multiple Users: Learning under Competition
Anandkumar, Animashree
Michael, Nithin
Tang, Ao
[J]. 2010 PROCEEDINGS IEEE INFOCOM, 2010,
[17] Enabling opportunistic and dynamic spectrum access through learning techniques
Alsaleh, Omar
Venkatraman, Pavithra
Hamdaoui, Bechir
Fern, Alan
[J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2011, 11 (12): : 1497 - 1506
[18] Opportunistic Channel Access Using Reinforcement Learning in Tiered CBRS Networks
Tonnemacher, Matthew
Tarver, Chance
Chandrasekhar, Vikram
Chen, Hao
Huang, Pengda
Ng, Boon Loong
Zhang, Jianzhong
Cavallaro, Joseph R.
Camp, Joseph
[J]. 2018 IEEE INTERNATIONAL SYMPOSIUM ON DYNAMIC SPECTRUM ACCESS NETWORKS (DYSPAN), 2018,
[19] Distributed Stochastic Online Learning Policies for Opportunistic Spectrum Access
Gai, Yi
Krishnamachari, Bhaskar
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2014, 62 (23) : 6184 - 6193
[20] Online Learning in Opportunistic Spectrum Access: A Restless Bandit Approach
Tekin, Cem
Liu, Mingyan
[J]. 2011 PROCEEDINGS IEEE INFOCOM, 2011, : 2462 - 2470

← 1 2 3 4 5 →