ON INFORMATION ASYMMETRY IN ONLINE REINFORCEMENT LEARNING

被引：0

作者：

Tampubolon, Ezra ^{[1
]}

Ceribasi, Haris ^{[1
]}

Boche, Holger ^{[1
,2
]}

机构：

[1] Tech Univ Munich, Lehrstuhl Theoret Informat Tech, Munich, Germany

[2] Munich Ctr Quantum Sci & Technol MCQST, Munich, Germany

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年

关键词：

Information Asymmetry; Q-learning; Markov Game; Reinforcement Learning; Resource Allocation; SECURITY;

D O I：

10.1109/ICASSP39728.2021.9413968

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this work, we study the system of two interacting non-cooperative Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which does not occur in an environment of general independent learners. Furthermore, we discuss the resulted post-learning policies, show that they are almost optimal in the underlying game sense, and provide numerical hints of almost welfare-optimal of the resulted policies.

引用

页码：4955 / 4959

页数：5

共 50 条

[41] Reinforcement Learning for Online Industrial Process Control
Govindhasamy, James J.
McLoone, Sean F.
Irwin, George W.
French, John J.
Doyle, Richard P.
JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2005, 9 (01) : 23 - 30
[42] Tank War Using Online Reinforcement Learning
Andersen, Kresten Toftgaard
Zeng, Yifeng
Christensen, Dennis Dahl
Tran, Dung
2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 2, 2009, : 497 - 500
[43] ONLINE REINFORCEMENT LEARNING FOR MULTIMEDIA BUFFER CONTROL
Mastronarde, Nicholas
van der Schaar, Mihaela
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 1958 - 1961
[44] Evolving Neural Networks for Online Reinforcement Learning
Metzen, Jan Hendrik
Edgington, Mark
Kassahun, Yohannes
Kirchner, Frank
PARALLEL PROBLEM SOLVING FROM NATURE - PPSN X, PROCEEDINGS, 2008, 5199 : 518 - +
[45] An online reinforcement learning approach for HVAC control
Solinas, Francesco M.
Macii, Alberto
Patti, Edoardo
Bottaccioli, Lorenzo
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
[46] Online reinforcement learning for adaptive interference coordination
Alcaraz, Juan J.
Ayala-Romero, Jose A.
Vales-Alonso, Javier
Losilla-Lopez, Fernando
TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2020, 31 (10)
[47] Online Adaptation of Deep Architectures with Reinforcement Learning
Ganegedara, Thushan
Ott, Lionel
Ramos, Fabio
ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 577 - 585
[48] Incremental Stochastic Factorization for Online Reinforcement Learning
Barreto, Andre M. S.
Beirigo, Rafael L.
Pineau, Joelle
Precup, Doina
THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1468 - 1475
[49] Online Learning with Side Information
Xu, Xiao
Vakili, Sattar
Zhao, Qing
Swami, Ananthram
MILCOM 2017 - 2017 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM), 2017, : 303 - 308
[50] Reinforcement Learning with Bounded Information Loss
Peters, Jan
Muelling, Katharina
Seldin, Yevgeny
Altun, Yasemin
BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING, 2010, 1305 : 365 - 372

← 1 2 3 4 5 →