On the source-to-target gap of robust double deep Q-learning in digital twin-enabled wireless networks

被引:2
|
作者
McManus, Maxwell [1 ]
Guan, Zhangyu [1 ]
Mastronarde, Nicholas [1 ]
Zou, Shaofeng [1 ]
机构
[1] Univ Buffalo, Dept Elect Engn, Buffalo, NY 14260 USA
关键词
Zero-touch Networks; Digital Twin; Reinforcement Learning; Domain Adaptation; Source-to-Target Gap; SIMULATION;
D O I
10.1117/12.2618612
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Digital twin has been envisioned as a key tool to enable data-driven real-time monitoring and prediction, automated modeling as well as zero-touch control and optimization in next-generation wireless networks. However, because of the mismatch between the dynamics in the source domain (i.e., the digital twin) and the target domain (i.e., the real network), policies generated in source domain by traditional machine learning algorithms may suffer from significant performance degradation when applied in the target domain, i.e., the so-called "source-to-target (S2T) gap" problem. In this work we investigate experimentally the S2T gap in digital twin-enabled wireless networks considering a new class of reinforcement learning algorithms referred to as robust deep reinforcement learning. We first design, based on a combination of double deep Q-learning and an R-contamination model, a robust learning framework to control the policy robustness through adversarial dynamics expected in the target domain. Then we test the robustness of the learning framework over UBSim, an event-driven universal simulator for broadband mobile wireless networks. The source domain is first constructed over UBSim by creating a virtual representation of an indoor testing environment at University at Buffalo, and then the target domain is constructed by modifying the source domain in terms of blockage distribution, user locations, among others. We compare the robust learning algorithm with traditional reinforcement learning algorithms in the presence of controlled model mismatch between the source and target domains. Through experiments we demonstrate that, with proper selection of parameter R, robust learning algorithms can reduce significantly the S2T gap, while they can be either too conservative or explorative otherwise. We observe that robust policy transfer is effective especially for target domains with time-varying blockage dynamics.
引用
收藏
页数:12
相关论文
共 43 条
  • [31] Q-Learning based Edge Caching Optimization for D2D Enabled Hierarchical Wireless Networks
    Wang, Chenyang
    Wang, Shanjia
    Li, Ding
    Wang, Xiaofei
    Li, Xiuhua
    Leung, Victor C. M.
    2018 IEEE 15TH INTERNATIONAL CONFERENCE ON MOBILE AD HOC AND SENSOR SYSTEMS (MASS), 2018, : 55 - 63
  • [32] Double deep Q-learning network-based path planning in UAV-assisted wireless powered NOMA communication networks
    Lei, Ming
    Fowler, Scott
    Wang, Juzhen
    Zhang, Xingjun
    Yu, Bocheng
    Yu, Bin
    2021 IEEE 94TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2021-FALL), 2021,
  • [33] Deep Q-Learning based Resource Management in UAV-assisted Wireless Powered IoT Networks
    Li, Kai
    Ni, Wei
    Tovar, Eduardo
    Jamalipour, Abbas
    ICC 2020 - 2020 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2020,
  • [34] Two-Stage WECC Composite Load Modeling: A Double Deep Q-Learning Networks Approach
    Wang, Xinan
    Wang, Yishen
    Shi, Di
    Wang, Jianhui
    Wang, Zhiwei
    IEEE TRANSACTIONS ON SMART GRID, 2020, 11 (05) : 4331 - 4344
  • [35] Energy aware optimal routing model for wireless multimedia sensor networks using modified Voronoi assisted prioritized double deep Q-learning
    Suseela, Sellamuthu
    Krithiga, Ravi
    Revathi, Muthusamy
    Sudhakaran, Gajendran
    Bhavadharini, Reddiyapalayam Murugeshan
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (06):
  • [36] DEN-DQL: Quick Convergent Deep Q-Learning with Double Exploration Networks for News Recommendation
    Song, Zhanghan
    Zhang, Dian
    Shi, Xiaochuan
    Li, Wei
    Ma, Chao
    Wu, Libing
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [37] Real-Time Data Transmission Scheduling Algorithm for Wireless Sensor Networks Based on Deep Q-Learning
    Zhang, Aiqi
    Sun, Meiyi
    Wang, Jiaqi
    Li, Zhiyi
    Cheng, Yanbo
    Wang, Cheng
    ELECTRONICS, 2022, 11 (12)
  • [38] Intelligent querying for target tracking in camera networks using deep Q-learning with n-step bootstrapping
    Sharma, Anil
    Anand, Saket
    Kaul, Sanjit K.
    IMAGE AND VISION COMPUTING, 2020, 103 (103)
  • [39] Multi-Agent Double Deep Q-Learning for Fairness in Multiple-Access Underlay Cognitive Radio Networks
    Ali, Zain
    Rezki, Zouheir
    Sadjadpour, Hamid
    IEEE Transactions on Machine Learning in Communications and Networking, 2024, 2 : 580 - 595
  • [40] Deep Q-learning based sparse code multiple access for ultra reliable low latency communication in industrial wireless networks
    Bhardwaj, Sanjay
    Kim, Dong-Seong
    TELECOMMUNICATION SYSTEMS, 2023, 83 (04) : 409 - 421