Model-Free Deep Inverse Reinforcement Learning by Logistic Regression

被引:23
|
作者
Uchibe, Eiji [1 ,2 ]
机构
[1] Dept Brain Robot Interface, ATR Computat Neurosci Labs, 2-2-2 Hikaridai, Seika, Kyoto 6190288, Japan
[2] Grad Univ, Okinawa Inst Sci & Technol, Neural Computat Unit, 1919-1 Tancha, Onna Son, Okinawa 9040495, Japan
关键词
Inverse reinforcement learning; Deep learning; Density ratio estimation; Logistic regression; OTHELLO; GAME;
D O I
10.1007/s11063-017-9702-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures. We formulate inverse reinforcement learning as a problem of density ratio estimation, and show that the log of the ratio between an optimal state transition and a baseline one is given by a part of reward and the difference of the value functions under the framework of linearly solvable Markov decision processes. The logarithm of density ratio is efficiently calculated by binomial logistic regression, of which the classifier is constructed by the reward and state value function. The classifier tries to discriminate between samples drawn from the optimal state transition probability and those from the baseline one. Then, the estimated state value function is used to initialize the part of the deep neural networks for forward reinforcement learning. The proposed deep forward and inverse reinforcement learning is applied into two benchmark games: Atari 2600 and Reversi. Simulation results show that our method reaches the best performance substantially faster than the standard combination of forward and inverse reinforcement learning as well as behavior cloning.
引用
收藏
页码:891 / 905
页数:15
相关论文
共 50 条
  • [41] Model-Free Reinforcement Learning with Continuous Action in Practice
    Degris, Thomas
    Pilarski, Patrick M.
    Sutton, Richard S.
    [J]. 2012 AMERICAN CONTROL CONFERENCE (ACC), 2012, : 2177 - 2182
  • [42] Driving in Dense Traffic with Model-Free Reinforcement Learning
    Saxena, Dhruv Mauria
    Bae, Sangjae
    Nakhaei, Alireza
    Fujimura, Kikuo
    Likhachev, Maxim
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 5385 - 5392
  • [43] MODEL-FREE ONLINE REINFORCEMENT LEARNING OF A ROBOTIC MANIPULATOR
    Sweafford, Jerry, Jr.
    Fahimi, Farbod
    [J]. MECHATRONIC SYSTEMS AND CONTROL, 2019, 47 (03): : 136 - 143
  • [44] Robotic Table Tennis with Model-Free Reinforcement Learning
    Gao, Wenbo
    Graesser, Laura
    Choromanski, Krzysztof
    Song, Xingyou
    Lazic, Nevena
    Sanketi, Pannag
    Sindhwani, Vikas
    Jaitly, Navdeep
    [J]. 2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5556 - 5563
  • [45] Sim-to-Real Model-Based and Model-Free Deep Reinforcement Learning for Tactile Pushing
    Yang, Max
    Lin, Yijiong
    Church, Alex
    Lloyd, John
    Zhang, Dandan
    Barton, David A. W.
    Lepora, Nathan F.
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (09) : 5480 - 5487
  • [46] Model-Free and Model-Based Active Learning for Regression
    O'Neill, Jack
    Delany, Sarah Jane
    MacNamee, Brian
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS, 2017, 513 : 375 - 386
  • [47] Model-free learning control of neutralization processes using reinforcement learning
    Syafiie, S.
    Tadeo, F.
    Martinez, E.
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2007, 20 (06) : 767 - 782
  • [48] Multi-Agent Pattern Formation: a Distributed Model-Free Deep Reinforcement Learning Approach
    Diallo, Elhadji Amadou Oury
    Sugawara, Toshiharu
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [49] FlexPool: A Distributed Model-Free Deep Reinforcement Learning Algorithm for Joint Passengers and Goods Transportation
    Manchella, Kaushik
    Umrawal, Abhishek K.
    Aggarwal, Vaneet
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (04) : 2035 - 2047
  • [50] Model-Free Ultra Reliable Low Latency Communication (URLLC): a Deep Reinforcement Learning Framework
    Kasgari, Ali Taleb Zadeh
    Saad, Walid
    [J]. ICC 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2019,