Exploration and Incentives in Reinforcement Learning

被引:0
|
作者
Simchowitz, Max [1 ]
Slivkins, Aleksandrs [2 ]
机构
[1] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA
[2] Microsoft Res NYC, New York, NY 10012 USA
关键词
incentivized exploration; exploration-exploitation tradeoff; mechanism design; information design; information asymmetry; Bayesian incentive-compatibility; reinforcement learning; Markov decision processes; MULTIARMED BANDIT; DESIGN; ALGORITHM;
D O I
10.1287/opre.2022.0495
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
How do you incentivize self-interested agents to explore when they prefer to exploit? We consider complex exploration problems, where each agent faces the same (but unknown) Markov decision process (MDP). In contrast with traditional formulations of reinforcement learning, agents control the choice of policies, whereas an algorithm can only issue recommendations. However, the algorithm controls the flow of information, and can incentivize the agents to explore via information asymmetry. We design an algorithm which explores all reachable states in the MDP. We achieve provable guarantees similar to those for incentivizing exploration in static, stateless exploration problems studied previously. To the best of our knowledge, this is the first work to consider mechanism design in a stateful, reinforcement learning setting.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Reward learning: Reinforcement, incentives, and expectations
    Berridge, KC
    [J]. PSYCHOLOGY OF LEARNING AND MOTIVATION: ADVANCES IN RESEARCH AND THEORY, VOL 40, 2001, 40 : 223 - 278
  • [2] Exploration Entropy for Reinforcement Learning
    Xin, Bo
    Yu, Haixu
    Qin, You
    Tang, Qing
    Zhu, Zhangqing
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [3] Exploration in Structured Reinforcement Learning
    Ok, Jungseul
    Proutiere, Alexandre
    Tranos, Damianos
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [4] Latent Exploration for Reinforcement Learning
    Chiappa, Alberto Silvio
    Vargas, Alessandro Marin
    Huang, Ann Zixiang
    Mathis, Alexander
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [5] Conservative Exploration in Reinforcement Learning
    Garcelon, Evrard
    Ghavamzadeh, Mohammad
    Lazaric, Alessandro
    Pirotta, Matteo
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 1431 - 1440
  • [6] Backtracking Exploration for Reinforcement Learning
    Chen, Xingguo
    Chen, Zening
    Sun, Dingyuanhao
    Gao, Yang
    [J]. 2023 5TH INTERNATIONAL CONFERENCE ON DISTRIBUTED ARTIFICIAL INTELLIGENCE, DAI 2023, 2023,
  • [7] Exploration by Distributional Reinforcement Learning
    Tang, Yunhao
    Agrawal, Shipra
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2710 - 2716
  • [8] Reinforcement learning with inertial exploration
    Bergeron, Dany
    Desjardins, Charles
    Laurnnier, Julien
    Chaib-draa, Brahim
    [J]. PROCEEDINGS OF THE IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY (IAT 2007), 2007, : 277 - +
  • [9] Bayesian Reinforcement Learning with Exploration
    Lattimore, Tor
    Hutter, Marcus
    [J]. ALGORITHMIC LEARNING THEORY (ALT 2014), 2014, 8776 : 170 - 184
  • [10] Learning-Driven Exploration for Reinforcement Learning
    Usama, Muhammad
    Chang, Dong Eui
    [J]. 2021 21ST INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2021), 2021, : 1146 - 1151