Exploration and Incentives in Reinforcement Learning

被引：0

作者：

Simchowitz, Max ^{[1
]}

Slivkins, Aleksandrs ^{[2
]}

机构：

[1] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA

[2] Microsoft Res NYC, New York, NY 10012 USA

来源：

OPERATIONS RESEARCH | 2024年 / 72卷 / 03期

关键词：

incentivized exploration; exploration-exploitation tradeoff; mechanism design; information design; information asymmetry; Bayesian incentive-compatibility; reinforcement learning; Markov decision processes; MULTIARMED BANDIT; DESIGN; ALGORITHM;

D O I：

10.1287/opre.2022.0495

中图分类号：

C93 [管理学];

学科分类号：

12 ; 1201 ; 1202 ; 120202 ;

摘要：

How do you incentivize self-interested agents to explore when they prefer to exploit? We consider complex exploration problems, where each agent faces the same (but unknown) Markov decision process (MDP). In contrast with traditional formulations of reinforcement learning, agents control the choice of policies, whereas an algorithm can only issue recommendations. However, the algorithm controls the flow of information, and can incentivize the agents to explore via information asymmetry. We design an algorithm which explores all reachable states in the MDP. We achieve provable guarantees similar to those for incentivizing exploration in static, stateless exploration problems studied previously. To the best of our knowledge, this is the first work to consider mechanism design in a stateful, reinforcement learning setting.

引用

页数：17

共 50 条

[1] Reward learning: Reinforcement, incentives, and expectations
Berridge, KC
[J]. PSYCHOLOGY OF LEARNING AND MOTIVATION: ADVANCES IN RESEARCH AND THEORY, VOL 40, 2001, 40 : 223 - 278
[2] Exploration Entropy for Reinforcement Learning
Xin, Bo
Yu, Haixu
Qin, You
Tang, Qing
Zhu, Zhangqing
[J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
[3] Exploration in Structured Reinforcement Learning
Ok, Jungseul
Proutiere, Alexandre
Tranos, Damianos
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[4] Latent Exploration for Reinforcement Learning
Chiappa, Alberto Silvio
Vargas, Alessandro Marin
Huang, Ann Zixiang
Mathis, Alexander
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[5] Conservative Exploration in Reinforcement Learning
Garcelon, Evrard
Ghavamzadeh, Mohammad
Lazaric, Alessandro
Pirotta, Matteo
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 1431 - 1440
[6] Backtracking Exploration for Reinforcement Learning
Chen, Xingguo
Chen, Zening
Sun, Dingyuanhao
Gao, Yang
[J]. 2023 5TH INTERNATIONAL CONFERENCE ON DISTRIBUTED ARTIFICIAL INTELLIGENCE, DAI 2023, 2023,
[7] Exploration by Distributional Reinforcement Learning
Tang, Yunhao
Agrawal, Shipra
[J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2710 - 2716
[8] Reinforcement learning with inertial exploration
Bergeron, Dany
Desjardins, Charles
Laurnnier, Julien
Chaib-draa, Brahim
[J]. PROCEEDINGS OF THE IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY (IAT 2007), 2007, : 277 - +
[9] Bayesian Reinforcement Learning with Exploration
Lattimore, Tor
Hutter, Marcus
[J]. ALGORITHMIC LEARNING THEORY (ALT 2014), 2014, 8776 : 170 - 184
[10] Learning-Driven Exploration for Reinforcement Learning
Usama, Muhammad
Chang, Dong Eui
[J]. 2021 21ST INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2021), 2021, : 1146 - 1151

← 1 2 3 4 5 →