Task Transfer by Preference-Based Cost Learning

被引:0
|
作者
Jing, Mingxuan [1 ]
Ma, Xiaojian [1 ]
Huang, Wenbing [2 ]
Sun, Fuchun [1 ]
Liu, Huaping [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Natl Lab Informat Sci & Technol TNList, Beijing 100084, Peoples R China
[2] Tencent AI Lab, Shenzhen, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of task transfer in reinforcement learning is migrating the action policy of an agent to the target task from the source task. Given their successes on robotic action planning, current methods mostly rely on two requirements: exactly relevant expert demonstrations or the explicitly-coded cost function on target task, both of which, however, are inconvenient to obtain in practice. In this paper, we relax these two strong conditions by developing a novel task transfer framework where the expert preference is applied as a guidance. In particular, we alternate the following two steps: Firstly, letting experts apply pre-defined preference rules to select related expert demonstrates for the target task. Secondly, based on the selection result, we learn the target cost function and trajectory distribution simultaneously via enhanced Adversarial MaxEnt IRL and generate more trajectories by the learned target distribution for the next preference selection. The theoretical analysis on the distribution learning and convergence of the proposed algorithm are provided. Extensive simulations on several benchmarks have been conducted for further verifying the effectiveness of the proposed method.
引用
收藏
页码:2471 / 2478
页数:8
相关论文
共 50 条
  • [1] Preference-based learning to rank
    Nir Ailon
    Mehryar Mohri
    [J]. Machine Learning, 2010, 80 : 189 - 211
  • [2] Preference-Based Policy Learning
    Akrour, Riad
    Schoenauer, Marc
    Sebag, Michele
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2011, 6911 : 12 - 27
  • [3] Preference-based learning to rank
    Ailon, Nir
    Mohri, Mehryar
    [J]. MACHINE LEARNING, 2010, 80 (2-3) : 189 - 211
  • [4] Task preference-based bottleneck assignment problem
    Ekta Jain
    Kalpana Dahiya
    Anuj Sharma
    Vanita Verma
    [J]. Computational and Applied Mathematics, 2022, 41
  • [5] Preference-based MMI for complex task environments
    Takahashi, M
    Fukui, K
    Kitamura, M
    [J]. ANALYSIS, DESIGN AND EVALUATION OF MAN-MACHINE SYSTEMS 1998, 1999, : 443 - 448
  • [6] Task preference-based bottleneck assignment problem
    Jain, Ekta
    Dahiya, Kalpana
    Sharma, Anuj
    Verma, Vanita
    [J]. COMPUTATIONAL & APPLIED MATHEMATICS, 2022, 41 (07):
  • [7] Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning
    Cheng, Weiwei
    Fuernkranz, Johannes
    Huellermeier, Eyke
    Park, Sang-Hyeun
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2011, 6911 : 312 - 327
  • [8] Task Decoupling in Preference-based Reinforcement Learning for Personalized Human-Robot Interaction
    Liu, Mingjiang
    Chen, Chunlin
    [J]. 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 848 - 855
  • [9] Learning state importance for preference-based reinforcement learning
    Zhang, Guoxi
    Kashima, Hisashi
    [J]. MACHINE LEARNING, 2023, 113 (4) : 1885 - 1901
  • [10] Learning state importance for preference-based reinforcement learning
    Guoxi Zhang
    Hisashi Kashima
    [J]. Machine Learning, 2024, 113 : 1885 - 1901