Model-Free Deep Inverse Reinforcement Learning by Logistic Regression

被引:1
|
作者
Eiji Uchibe
机构
[1] ATR Computational Neuroscience Labs.,Department of Brain Robot Interface
[2] Okinawa Institute of Science and Technology Graduate University,Neural Computation Unit
来源
Neural Processing Letters | 2018年 / 47卷
关键词
Inverse reinforcement learning; Deep learning; Density ratio estimation; Logistic regression;
D O I
暂无
中图分类号
学科分类号
摘要
This paper proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures. We formulate inverse reinforcement learning as a problem of density ratio estimation, and show that the log of the ratio between an optimal state transition and a baseline one is given by a part of reward and the difference of the value functions under the framework of linearly solvable Markov decision processes. The logarithm of density ratio is efficiently calculated by binomial logistic regression, of which the classifier is constructed by the reward and state value function. The classifier tries to discriminate between samples drawn from the optimal state transition probability and those from the baseline one. Then, the estimated state value function is used to initialize the part of the deep neural networks for forward reinforcement learning. The proposed deep forward and inverse reinforcement learning is applied into two benchmark games: Atari 2600 and Reversi. Simulation results show that our method reaches the best performance substantially faster than the standard combination of forward and inverse reinforcement learning as well as behavior cloning.
引用
收藏
页码:891 / 905
页数:14
相关论文
共 50 条
  • [31] Intelligent Control of Wastewater Treatment Plants Based on Model-Free Deep Reinforcement Learning
    Aponte-Rengifo, Oscar
    Francisco, Mario
    Vilanova, Ramon
    Vega, Pastora
    Revollar, Silvana
    [J]. PROCESSES, 2023, 11 (08)
  • [32] Adaptive Weight Tuning of EWMA Controller via Model-Free Deep Reinforcement Learning
    Ma, Zhu
    Pan, Tianhong
    [J]. IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2023, 36 (01) : 91 - 99
  • [33] Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey
    Liu, Yongshuai
    Halev, Avishai
    Liu, Xin
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4508 - 4515
  • [34] Improving Optimistic Exploration in Model-Free Reinforcement Learning
    Grzes, Marek
    Kudenko, Daniel
    [J]. ADAPTIVE AND NATURAL COMPUTING ALGORITHMS, 2009, 5495 : 360 - 369
  • [35] Constrained model-free reinforcement learning for process optimization
    Pan, Elton
    Petsagkourakis, Panagiotis
    Mowbray, Max
    Zhang, Dongda
    del Rio-Chanona, Ehecatl Antonio
    [J]. COMPUTERS & CHEMICAL ENGINEERING, 2021, 154
  • [36] Model-Free Preference-Based Reinforcement Learning
    Wirth, Christian
    Fuernkranz, Johannes
    Neumann, Gerhard
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2222 - 2228
  • [37] Model-Free μ Synthesis via Adversarial Reinforcement Learning
    Keivan, Darioush
    Havens, Aaron
    Seiler, Peter
    Dullerud, Geir
    Hu, Bin
    [J]. 2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 3335 - 3341
  • [38] An adaptive clustering method for model-free reinforcement learning
    Matt, A
    Regensburger, G
    [J]. INMIC 2004: 8TH INTERNATIONAL MULTITOPIC CONFERENCE, PROCEEDINGS, 2004, : 362 - 367
  • [39] Model-Free Reinforcement Learning for Mean Field Games
    Mishra, Rajesh
    Vasal, Deepanshu
    Vishwanath, Sriram
    [J]. IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (04): : 2141 - 2151
  • [40] Counterfactual Credit Assignment in Model-Free Reinforcement Learning
    Mesnard, Thomas
    Weber, Theophane
    Viola, Fabio
    Thakoor, Shantanu
    Saade, Alaa
    Harutyunyan, Anna
    Dabney, Will
    Stepleton, Tom
    Heess, Nicolas
    Guez, Arthur
    Moulines, Eric
    Hutter, Marcus
    Buesing, Lars
    Munos, Remi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139