On the source-to-target gap of robust double deep Q-learning in digital twin-enabled wireless networks

被引：2

作者：

McManus, Maxwell ^{[1
]}

Guan, Zhangyu ^{[1
]}

Mastronarde, Nicholas ^{[1
]}

Zou, Shaofeng ^{[1
]}

机构：

[1] Univ Buffalo, Dept Elect Engn, Buffalo, NY 14260 USA

来源：

BIG DATA IV: LEARNING, ANALYTICS, AND APPLICATIONS | 2022年 / 12097卷

关键词：

Zero-touch Networks; Digital Twin; Reinforcement Learning; Domain Adaptation; Source-to-Target Gap; SIMULATION;

D O I：

10.1117/12.2618612

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Digital twin has been envisioned as a key tool to enable data-driven real-time monitoring and prediction, automated modeling as well as zero-touch control and optimization in next-generation wireless networks. However, because of the mismatch between the dynamics in the source domain (i.e., the digital twin) and the target domain (i.e., the real network), policies generated in source domain by traditional machine learning algorithms may suffer from significant performance degradation when applied in the target domain, i.e., the so-called "source-to-target (S2T) gap" problem. In this work we investigate experimentally the S2T gap in digital twin-enabled wireless networks considering a new class of reinforcement learning algorithms referred to as robust deep reinforcement learning. We first design, based on a combination of double deep Q-learning and an R-contamination model, a robust learning framework to control the policy robustness through adversarial dynamics expected in the target domain. Then we test the robustness of the learning framework over UBSim, an event-driven universal simulator for broadband mobile wireless networks. The source domain is first constructed over UBSim by creating a virtual representation of an indoor testing environment at University at Buffalo, and then the target domain is constructed by modifying the source domain in terms of blockage distribution, user locations, among others. We compare the robust learning algorithm with traditional reinforcement learning algorithms in the presence of controlled model mismatch between the source and target domains. Through experiments we demonstrate that, with proper selection of parameter R, robust learning algorithms can reduce significantly the S2T gap, while they can be either too conservative or explorative otherwise. We observe that robust policy transfer is effective especially for target domains with time-varying blockage dynamics.

引用

页数：12

共 43 条

[1] Digital twin-enabled dynamic scheduling with preventive maintenance using a double-layer Q-learning algorithm
Yan, Qi
Wang, Hongfeng
Wu, Fang
COMPUTERS & OPERATIONS RESEARCH, 2022, 144
[2] Digital twin-enabled adaptive scheduling strategy based on deep reinforcement learning
Gan, XueMei
Zuo, Ying
Zhang, AnSi
Li, ShaoBo
Tao, Fei
SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2023, 66 (07) : 1937 - 1951
[3] Digital twin-enabled adaptive scheduling strategy based on deep reinforcement learning
GAN XueMei
ZUO Ying
ZHANG AnSi
LI ShaoBo
TAO Fei
Science China(Technological Sciences), 2023, 66 (07) : 1937 - 1951
[4] Digital twin-enabled adaptive scheduling strategy based on deep reinforcement learning
XueMei Gan
Ying Zuo
AnSi Zhang
ShaoBo Li
Fei Tao
Science China Technological Sciences, 2023, 66 : 1937 - 1951
[5] Digital twin-enabled adaptive scheduling strategy based on deep reinforcement learning
GAN XueMei
ZUO Ying
ZHANG AnSi
LI ShaoBo
TAO Fei
Science China(Technological Sciences), 2023, (07) : 1937 - 1951
[6] Double Deep Q-Learning Based Channel Estimation for Industrial Wireless Networks
Bhardwaj, Sanjay
Lee, Jae-Min
Kim, Dong-Seong
11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1318 - 1320
[7] Blockchain-based Robust SDN Framework for Digital Twin-Enabled IoT Networks
Bhardwaj, Aditya
Chaudhary, Rajat
Aslam, Anjum Mohd
Budhiraja, Ishan
2023 IEEE 98TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2023-FALL, 2023,
[8] Digital twin-enabled self-evolved optical transceiver using deep reinforcement learning
Li, Jin
Wang, Danshi
Zhang, Min
Cui, Siheng
OPTICS LETTERS, 2020, 45 (16) : 4654 - 4657
[9] A Q-Learning Based Target Coverage Algorithm for Wireless Sensor Networks
Xiong, Peng
He, Dan
Lu, Tiankun
MATHEMATICS, 2025, 13 (03)
[10] Q-learning Enabled Intelligent Energy Attack in Sustainable Wireless Communication Networks
Li, Long
Luo, Yu
Pu, Lina
IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,

← 1 2 3 4 5 →