ON INFORMATION ASYMMETRY IN ONLINE REINFORCEMENT LEARNING

被引:0
|
作者
Tampubolon, Ezra [1 ]
Ceribasi, Haris [1 ]
Boche, Holger [1 ,2 ]
机构
[1] Tech Univ Munich, Lehrstuhl Theoret Informat Tech, Munich, Germany
[2] Munich Ctr Quantum Sci & Technol MCQST, Munich, Germany
关键词
Information Asymmetry; Q-learning; Markov Game; Reinforcement Learning; Resource Allocation; SECURITY;
D O I
10.1109/ICASSP39728.2021.9413968
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, we study the system of two interacting non-cooperative Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which does not occur in an environment of general independent learners. Furthermore, we discuss the resulted post-learning policies, show that they are almost optimal in the underlying game sense, and provide numerical hints of almost welfare-optimal of the resulted policies.
引用
收藏
页码:4955 / 4959
页数:5
相关论文
共 50 条
  • [1] Online Reinforcement Learning for Self-adaptive Information Systems
    Palm, Alexander
    Metzger, Andreas
    Pohl, Klaus
    ADVANCED INFORMATION SYSTEMS ENGINEERING, CAISE 2020, 2020, 12127 : 169 - 184
  • [2] Online testing with reinforcement learning
    Veanes, Margus
    Roy, Pritam
    Campbell, Colin
    FORMAL APPROACHES TO SOFTWARE TESTING AND RUNTIME VERIFICATION, 2006, 4262 : 240 - +
  • [3] Online shielding for reinforcement learning
    Koenighofer, Bettina
    Rudolf, Julian
    Palmisano, Alexander
    Tappler, Martin
    Bloem, Roderick
    INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING, 2023, 19 (04) : 379 - 394
  • [4] Online shielding for reinforcement learning
    Bettina Könighofer
    Julian Rudolf
    Alexander Palmisano
    Martin Tappler
    Roderick Bloem
    Innovations in Systems and Software Engineering, 2023, 19 : 379 - 394
  • [5] Online Sparse Reinforcement Learning
    Hao, Botao
    Lattimore, Tor
    Szepesvari, Csaba
    Wang, Mengdi
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 316 - +
  • [6] Online Task Offloading in UDN: A Deep Reinforcement Learning Approach with Incomplete Information
    Lin, Ziqi
    Gu, Bo
    Zhang, Xu
    Yi, Difei
    Han, Yu
    2022 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2022, : 1236 - 1241
  • [7] Online learning of shaping rewards in reinforcement learning
    Grzes, Marek
    Kudenko, Daniel
    NEURAL NETWORKS, 2010, 23 (04) : 541 - 550
  • [8] Compatibility and Information Asymmetry in Online Matching Platforms
    Basu, Amit
    Bhaskaran, Sreekumar
    Mukherjee, Rajiv
    MANAGEMENT SCIENCE, 2024, 70 (11) : 7730 - 7749
  • [9] Signaling theory and information asymmetry in online commerce
    Mavlanova, Tamilla
    Benbunan-Fich, Raquel
    Koufaris, Marios
    INFORMATION & MANAGEMENT, 2012, 49 (05) : 240 - 247
  • [10] Reinforcement learning in information searching
    Cen, Yonghua
    Gan, Liren
    Bai, Chen
    INFORMATION RESEARCH-AN INTERNATIONAL ELECTRONIC JOURNAL, 2013, 18 (01):