ON INFORMATION ASYMMETRY IN ONLINE REINFORCEMENT LEARNING

被引：0

作者：

Tampubolon, Ezra ^{[1
]}

Ceribasi, Haris ^{[1
]}

Boche, Holger ^{[1
,2
]}

机构：

[1] Tech Univ Munich, Lehrstuhl Theoret Informat Tech, Munich, Germany

[2] Munich Ctr Quantum Sci & Technol MCQST, Munich, Germany

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

Information Asymmetry; Q-learning; Markov Game; Reinforcement Learning; Resource Allocation; SECURITY;

D O I：

10.1109/ICASSP39728.2021.9413968

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this work, we study the system of two interacting non-cooperative Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which does not occur in an environment of general independent learners. Furthermore, we discuss the resulted post-learning policies, show that they are almost optimal in the underlying game sense, and provide numerical hints of almost welfare-optimal of the resulted policies.

引用

页码：4955 / 4959

页数：5

共 50 条

[1] Online Reinforcement Learning for Self-adaptive Information Systems
Palm, Alexander
Metzger, Andreas
Pohl, Klaus
ADVANCED INFORMATION SYSTEMS ENGINEERING, CAISE 2020, 2020, 12127 : 169 - 184
[2] Online testing with reinforcement learning
Veanes, Margus
Roy, Pritam
Campbell, Colin
FORMAL APPROACHES TO SOFTWARE TESTING AND RUNTIME VERIFICATION, 2006, 4262 : 240 - +
[3] Online shielding for reinforcement learning
Koenighofer, Bettina
Rudolf, Julian
Palmisano, Alexander
Tappler, Martin
Bloem, Roderick
INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING, 2023, 19 (04) : 379 - 394
[4] Online shielding for reinforcement learning
Bettina Könighofer
Julian Rudolf
Alexander Palmisano
Martin Tappler
Roderick Bloem
Innovations in Systems and Software Engineering, 2023, 19 : 379 - 394
[5] Online Sparse Reinforcement Learning
Hao, Botao
Lattimore, Tor
Szepesvari, Csaba
Wang, Mengdi
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 316 - +
[6] Online Task Offloading in UDN: A Deep Reinforcement Learning Approach with Incomplete Information
Lin, Ziqi
Gu, Bo
Zhang, Xu
Yi, Difei
Han, Yu
2022 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2022, : 1236 - 1241
[7] Online learning of shaping rewards in reinforcement learning
Grzes, Marek
Kudenko, Daniel
NEURAL NETWORKS, 2010, 23 (04) : 541 - 550
[8] Compatibility and Information Asymmetry in Online Matching Platforms
Basu, Amit
Bhaskaran, Sreekumar
Mukherjee, Rajiv
MANAGEMENT SCIENCE, 2024, 70 (11) : 7730 - 7749
[9] Signaling theory and information asymmetry in online commerce
Mavlanova, Tamilla
Benbunan-Fich, Raquel
Koufaris, Marios
INFORMATION & MANAGEMENT, 2012, 49 (05) : 240 - 247
[10] Reinforcement learning in information searching
Cen, Yonghua
Gan, Liren
Bai, Chen
INFORMATION RESEARCH-AN INTERNATIONAL ELECTRONIC JOURNAL, 2013, 18 (01):

← 1 2 3 4 5 →