Learning Intention-Aware Policies in Deep Reinforcement Learning

被引:0
|
作者
Zhao, T. [1 ]
Wu, S. [1 ]
Li, G. [1 ]
Chen, Y. [1 ]
Niu, G. [2 ]
Sugiyama, Masashi [2 ,3 ]
机构
[1] Tianjin Univ Sci & Technol, Coll Artificial Intelligence, Tianjin 300457, Peoples R China
[2] RIKEN Ctr Adv Intelligence Project, Tokyo 1030027, Japan
[3] Univ Tokyo, Grad Sch Frontier Sci, Tokyo 2778561, Japan
基金
中国国家自然科学基金;
关键词
INFORMATION; CURIOSITY;
D O I
10.1162/neco_a_01607
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep reinforcement learning (DRL) provides an agent with an optimal policy so as to maximize the cumulative rewards. The policy defined in DRL mainly depends on the state, historical memory, and policy model parameters. However, we humans usually take actions according to our own intentions, such as moving fast or slow, besides the elements included in the traditional policy models. In order to make the action-choosing mechanism more similar to humans and make the agent to select actions that incorporate intentions, we propose an intention-aware policy learning method in this letter To formalize this process, we first define an intention-aware policy by incorporating the intention information into the policy model, which is learned by maximizing the cumulative rewards with the mutual information (MI) between the intention and the action. Then we derive an approximation of the MI objective that can be optimized efficiently. Finally, we demonstrate the effectiveness of the intention-aware policy in the classical MuJoCo control task and the multigoal continuous chain walking task.
引用
收藏
页码:1657 / 1677
页数:21
相关论文
共 50 条
  • [1] Intention-aware Transformer with Adaptive Social and Temporal Learning for Vehicle Trajectory Prediction
    Hu, Yu
    Chen, Xiaobo
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3721 - 3727
  • [2] Intention-Aware Motion Planning Using Learning Based Human Motion Prediction
    Park, Jae Sung
    Park, Chonhyon
    Manocha, Dinesh
    [J]. ROBOTICS: SCIENCE AND SYSTEMS XIII, 2017,
  • [3] Learning Urban Driving Policies using Deep Reinforcement Learning
    Agarwal, Tanmay
    Arora, Hitesh
    Schneider, Jeff
    [J]. 2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 607 - 614
  • [4] Intention-Aware Routing of Electric Vehicles
    de Weerdt, Mathijs M.
    Stein, Sebastian
    Gerding, Enrico H.
    Robu, Valentin
    Jennings, Nicholas R.
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2016, 17 (05) : 1472 - 1482
  • [5] DEEP REINFORCEMENT LEARNING FOR TRANSFER OF CONTROL POLICIES
    Cunningham, James D.
    Miller, Simon W.
    Yukish, Michael A.
    Simpson, Timothy W.
    Tucker, Conrad S.
    [J]. PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2019, VOL 2A, 2020,
  • [6] EDGE: Explaining Deep Reinforcement Learning Policies
    Guo, Wenbo
    Wu, Xian
    Khan, Usmann
    Xing, Xinyu
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [7] Verified Probabilistic Policies for Deep Reinforcement Learning
    Bacci, Edoardo
    Parker, David
    [J]. NASA FORMAL METHODS (NFM 2022), 2022, 13260 : 193 - 212
  • [8] Discovering symbolic policies with deep reinforcement learning
    Landajuela, Mikel
    Petersen, Brenden K.
    Kim, Sookyung
    Santiago, Claudio P.
    Glatt, Ruben
    Mundhenk, T. Nathan
    Pettit, Jacob F.
    Faissol, Daniel M.
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [9] Toward intention-aware services provision
    Lee, Chiung-Hon Leon
    [J]. TENCON 2007 - 2007 IEEE REGION 10 CONFERENCE, VOLS 1-3, 2007, : 1359 - 1362
  • [10] Deep Intention-Aware Network for Click-Through Rate Prediction
    Xia, Yaxian
    Cao, Yi
    Hu, Sihao
    Liu, Tong
    Lu, Lingling
    [J]. COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 533 - 537