On the source-to-target gap of robust double deep Q-learning in digital twin-enabled wireless networks

被引:2
|
作者
McManus, Maxwell [1 ]
Guan, Zhangyu [1 ]
Mastronarde, Nicholas [1 ]
Zou, Shaofeng [1 ]
机构
[1] Univ Buffalo, Dept Elect Engn, Buffalo, NY 14260 USA
关键词
Zero-touch Networks; Digital Twin; Reinforcement Learning; Domain Adaptation; Source-to-Target Gap; SIMULATION;
D O I
10.1117/12.2618612
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Digital twin has been envisioned as a key tool to enable data-driven real-time monitoring and prediction, automated modeling as well as zero-touch control and optimization in next-generation wireless networks. However, because of the mismatch between the dynamics in the source domain (i.e., the digital twin) and the target domain (i.e., the real network), policies generated in source domain by traditional machine learning algorithms may suffer from significant performance degradation when applied in the target domain, i.e., the so-called "source-to-target (S2T) gap" problem. In this work we investigate experimentally the S2T gap in digital twin-enabled wireless networks considering a new class of reinforcement learning algorithms referred to as robust deep reinforcement learning. We first design, based on a combination of double deep Q-learning and an R-contamination model, a robust learning framework to control the policy robustness through adversarial dynamics expected in the target domain. Then we test the robustness of the learning framework over UBSim, an event-driven universal simulator for broadband mobile wireless networks. The source domain is first constructed over UBSim by creating a virtual representation of an indoor testing environment at University at Buffalo, and then the target domain is constructed by modifying the source domain in terms of blockage distribution, user locations, among others. We compare the robust learning algorithm with traditional reinforcement learning algorithms in the presence of controlled model mismatch between the source and target domains. Through experiments we demonstrate that, with proper selection of parameter R, robust learning algorithms can reduce significantly the S2T gap, while they can be either too conservative or explorative otherwise. We observe that robust policy transfer is effective especially for target domains with time-varying blockage dynamics.
引用
收藏
页数:12
相关论文
共 43 条
  • [1] Digital twin-enabled dynamic scheduling with preventive maintenance using a double-layer Q-learning algorithm
    Yan, Qi
    Wang, Hongfeng
    Wu, Fang
    COMPUTERS & OPERATIONS RESEARCH, 2022, 144
  • [2] Digital twin-enabled adaptive scheduling strategy based on deep reinforcement learning
    Gan, XueMei
    Zuo, Ying
    Zhang, AnSi
    Li, ShaoBo
    Tao, Fei
    SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2023, 66 (07) : 1937 - 1951
  • [3] Digital twin-enabled adaptive scheduling strategy based on deep reinforcement learning
    GAN XueMei
    ZUO Ying
    ZHANG AnSi
    LI ShaoBo
    TAO Fei
    Science China(Technological Sciences), 2023, 66 (07) : 1937 - 1951
  • [4] Digital twin-enabled adaptive scheduling strategy based on deep reinforcement learning
    XueMei Gan
    Ying Zuo
    AnSi Zhang
    ShaoBo Li
    Fei Tao
    Science China Technological Sciences, 2023, 66 : 1937 - 1951
  • [5] Digital twin-enabled adaptive scheduling strategy based on deep reinforcement learning
    GAN XueMei
    ZUO Ying
    ZHANG AnSi
    LI ShaoBo
    TAO Fei
    Science China(Technological Sciences), 2023, (07) : 1937 - 1951
  • [6] Double Deep Q-Learning Based Channel Estimation for Industrial Wireless Networks
    Bhardwaj, Sanjay
    Lee, Jae-Min
    Kim, Dong-Seong
    11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1318 - 1320
  • [7] Blockchain-based Robust SDN Framework for Digital Twin-Enabled IoT Networks
    Bhardwaj, Aditya
    Chaudhary, Rajat
    Aslam, Anjum Mohd
    Budhiraja, Ishan
    2023 IEEE 98TH VEHICULAR TECHNOLOGY CONFERENCE, VTC2023-FALL, 2023,
  • [8] Digital twin-enabled self-evolved optical transceiver using deep reinforcement learning
    Li, Jin
    Wang, Danshi
    Zhang, Min
    Cui, Siheng
    OPTICS LETTERS, 2020, 45 (16) : 4654 - 4657
  • [9] A Q-Learning Based Target Coverage Algorithm for Wireless Sensor Networks
    Xiong, Peng
    He, Dan
    Lu, Tiankun
    MATHEMATICS, 2025, 13 (03)
  • [10] Q-learning Enabled Intelligent Energy Attack in Sustainable Wireless Communication Networks
    Li, Long
    Luo, Yu
    Pu, Lina
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,