Towards Learning from Implicit Human Reward

被引:0
|
作者
Li, Guangliang [1 ,2 ]
Dibeklioglu, Hamdi [3 ]
Whiteson, Shimon [4 ]
Hung, Hayley [3 ]
机构
[1] Ocean Univ China, Qingdao, Shandong, Peoples R China
[2] Univ Amsterdam, Amsterdam, Netherlands
[3] Delft Univ Technol, Delft, Netherlands
[4] Univ Oxford, Oxford, England
关键词
Reinforcement learning; human agent interaction;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The TAMER framework provides a way for agents to learn to solve tasks using human-generated rewards. Previous research showed that humans give copious feedback early in training but very sparsely thereafter and that an agent's competitive feedback - informing the trainer about its performance relative to other trainers - can greatly affect the trainer's engagement and the agent's learning. In this paper, we present the first large-scale study of TAMER, involving 561 subjects, which investigates the effect of the agent's competitive feedback in a new setting as well as the potential for learning from trainers' facial expressions. Our results show for the first time that a TAMER agent can successfully learn to play Infinite Mario, a challenging reinforcement-learning benchmark problem. In addition, our study supports prior results demonstrating the importance of bi-directional feedback and competitive elements in the training interface. Finally, our results shed light on the potential for using trainers' facial expressions as reward signals, as well as the role of age and gender in trainer behavior and agent performance.
引用
收藏
页码:1353 / 1354
页数:2
相关论文
共 50 条
  • [1] Towards Learning Reward Functions from User Interactions
    Li, Ziming
    Kiseleva, Julia
    de Rijke, Maarten
    Grotov, Artem
    [J]. ICTIR'17: PROCEEDINGS OF THE 2017 ACM SIGIR INTERNATIONAL CONFERENCE THEORY OF INFORMATION RETRIEVAL, 2017, : 289 - 292
  • [2] Predictive value and reward in implicit classification learning
    Lam, Judith M.
    Waechter, Tobias
    Globas, Christoph
    Karnath, Hans-Otto
    Luft, Andreas R.
    [J]. HUMAN BRAIN MAPPING, 2013, 34 (01) : 176 - 185
  • [3] Implicit reward-based motor learning
    Nina M. van Mastrigt
    Jonathan S. Tsay
    Tianhe Wang
    Guy Avraham
    Sabrina J. Abram
    Katinka van der Kooij
    Jeroen B. J. Smeets
    Richard B. Ivry
    [J]. Experimental Brain Research, 2023, 241 : 2287 - 2298
  • [4] Evidence of a diurnal rhythm in implicit reward learning
    Whitton, Alexis E.
    Mehta, Malavika
    Ironside, Manon L.
    Murray, Greg
    Pizzagalli, Diego A.
    [J]. CHRONOBIOLOGY INTERNATIONAL, 2018, 35 (08) : 1104 - 1114
  • [5] Implicit reward-based motor learning
    van Mastrigt, Nina M.
    Tsay, Jonathan S.
    Wang, Tianhe
    Avraham, Guy
    Abram, Sabrina J.
    van der Kooij, Katinka
    Smeets, Jeroen B. J.
    Ivry, Richard B.
    [J]. EXPERIMENTAL BRAIN RESEARCH, 2023, 241 (09) : 2287 - 2298
  • [6] Reward-rational (implicit) choice: A unifying formalism for reward learning
    Jeon, Hong Jun
    Milli, Smitha
    Dragan, Anca
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [7] Implicit and explicit reward learning in chronic nicotine use
    Paelecke-Habermann, Yvonne
    Paelecke, Marko
    Giegerich, Katharina
    Reschke, Katja
    Kuebler, Andrea
    [J]. DRUG AND ALCOHOL DEPENDENCE, 2013, 129 (1-2) : 8 - 17
  • [8] Reward learning from human preferences and demonstrations in Atari
    Ibarz, Borja
    Leike, Jan
    Pohlen, Tobias
    Irving, Geoffrey
    Legg, Shane
    Amodei, Dario
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [9] Towards an explicit account of implicit learning
    Forkstam, C
    Petersson, KM
    [J]. CURRENT OPINION IN NEUROLOGY, 2005, 18 (04) : 435 - 441
  • [10] Reward representations and reward-related learning in the human brain: insights from neuroimaging
    O'Doherty, JP
    [J]. CURRENT OPINION IN NEUROBIOLOGY, 2004, 14 (06) : 769 - 776