On the source-to-target gap of robust double deep Q-learning in digital twin-enabled wireless networks

被引：2

作者：

McManus, Maxwell ^{[1
]}

Guan, Zhangyu ^{[1
]}

Mastronarde, Nicholas ^{[1
]}

Zou, Shaofeng ^{[1
]}

机构：

[1] Univ Buffalo, Dept Elect Engn, Buffalo, NY 14260 USA

来源：

BIG DATA IV: LEARNING, ANALYTICS, AND APPLICATIONS | 2022年 / 12097卷

关键词：

Zero-touch Networks; Digital Twin; Reinforcement Learning; Domain Adaptation; Source-to-Target Gap; SIMULATION;

D O I：

10.1117/12.2618612

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Digital twin has been envisioned as a key tool to enable data-driven real-time monitoring and prediction, automated modeling as well as zero-touch control and optimization in next-generation wireless networks. However, because of the mismatch between the dynamics in the source domain (i.e., the digital twin) and the target domain (i.e., the real network), policies generated in source domain by traditional machine learning algorithms may suffer from significant performance degradation when applied in the target domain, i.e., the so-called "source-to-target (S2T) gap" problem. In this work we investigate experimentally the S2T gap in digital twin-enabled wireless networks considering a new class of reinforcement learning algorithms referred to as robust deep reinforcement learning. We first design, based on a combination of double deep Q-learning and an R-contamination model, a robust learning framework to control the policy robustness through adversarial dynamics expected in the target domain. Then we test the robustness of the learning framework over UBSim, an event-driven universal simulator for broadband mobile wireless networks. The source domain is first constructed over UBSim by creating a virtual representation of an indoor testing environment at University at Buffalo, and then the target domain is constructed by modifying the source domain in terms of blockage distribution, user locations, among others. We compare the robust learning algorithm with traditional reinforcement learning algorithms in the presence of controlled model mismatch between the source and target domains. Through experiments we demonstrate that, with proper selection of parameter R, robust learning algorithms can reduce significantly the S2T gap, while they can be either too conservative or explorative otherwise. We observe that robust policy transfer is effective especially for target domains with time-varying blockage dynamics.

引用

页数：12

共 43 条

[31] Q-Learning based Edge Caching Optimization for D2D Enabled Hierarchical Wireless Networks
Wang, Chenyang
Wang, Shanjia
Li, Ding
Wang, Xiaofei
Li, Xiuhua
Leung, Victor C. M.
2018 IEEE 15TH INTERNATIONAL CONFERENCE ON MOBILE AD HOC AND SENSOR SYSTEMS (MASS), 2018, : 55 - 63
[32] Double deep Q-learning network-based path planning in UAV-assisted wireless powered NOMA communication networks
Lei, Ming
Fowler, Scott
Wang, Juzhen
Zhang, Xingjun
Yu, Bocheng
Yu, Bin
2021 IEEE 94TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2021-FALL), 2021,
[33] Deep Q-Learning based Resource Management in UAV-assisted Wireless Powered IoT Networks
Li, Kai
Ni, Wei
Tovar, Eduardo
Jamalipour, Abbas
ICC 2020 - 2020 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2020,
[34] Two-Stage WECC Composite Load Modeling: A Double Deep Q-Learning Networks Approach
Wang, Xinan
Wang, Yishen
Shi, Di
Wang, Jianhui
Wang, Zhiwei
IEEE TRANSACTIONS ON SMART GRID, 2020, 11 (05) : 4331 - 4344
[35] Energy aware optimal routing model for wireless multimedia sensor networks using modified Voronoi assisted prioritized double deep Q-learning
Suseela, Sellamuthu
Krithiga, Ravi
Revathi, Muthusamy
Sudhakaran, Gajendran
Bhavadharini, Reddiyapalayam Murugeshan
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (06):
[36] DEN-DQL: Quick Convergent Deep Q-Learning with Double Exploration Networks for News Recommendation
Song, Zhanghan
Zhang, Dian
Shi, Xiaochuan
Li, Wei
Ma, Chao
Wu, Libing
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[37] Real-Time Data Transmission Scheduling Algorithm for Wireless Sensor Networks Based on Deep Q-Learning
Zhang, Aiqi
Sun, Meiyi
Wang, Jiaqi
Li, Zhiyi
Cheng, Yanbo
Wang, Cheng
ELECTRONICS, 2022, 11 (12)
[38] Intelligent querying for target tracking in camera networks using deep Q-learning with n-step bootstrapping
Sharma, Anil
Anand, Saket
Kaul, Sanjit K.
IMAGE AND VISION COMPUTING, 2020, 103 (103)
[39] Multi-Agent Double Deep Q-Learning for Fairness in Multiple-Access Underlay Cognitive Radio Networks
Ali, Zain
Rezki, Zouheir
Sadjadpour, Hamid
IEEE Transactions on Machine Learning in Communications and Networking, 2024, 2 : 580 - 595
[40] Deep Q-learning based sparse code multiple access for ultra reliable low latency communication in industrial wireless networks
Bhardwaj, Sanjay
Kim, Dong-Seong
TELECOMMUNICATION SYSTEMS, 2023, 83 (04) : 409 - 421

← 1 2 3 4 5 →