Learning Parameterized Policies for Markov Decision Processes through Demonstrations

被引:0
|
作者
Hanawal, Manjesh K. [1 ]
Liu, Hao [2 ,3 ,4 ]
Zhu, Henghui [2 ,3 ]
Paschalidis, Ioannis Ch. [2 ,3 ]
机构
[1] Indian Inst Technol, IEOR, Powai 400076, MH, India
[2] Boston Univ, Dept Elect & Comp Engn, Boston, MA 02215 USA
[3] Boston Univ, Div Syst Engn, Boston, MA 02215 USA
[4] Zhejiang Univ, Coll Control Sci & Engn, Hangzhou 310027, Zhejiang, Peoples R China
关键词
Machine learning; Markov decision processes; reinforcement learning;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the problem of learning a policy used by an agent in a Markov decision process using state-action samples. We focus on a class of parameterized policies and use Li-regularized logistic regression to train a policy that best fits the observed state-action pairs (demonstrations). We bound the difference in average reward of the trained and the original policy (regret) in terms of the generalization error and sensitivity parameters of the Markov chain. Specifically, we use techniques from sample complexity theory to relate regret to the generalization error and techniques from sensitivity analysis of the stationary distribution of Markov chains to relate regret to the ergodic coefficient of the Markov chain. We demonstrate the effectiveness of our method on a synthetic example.
引用
收藏
页码:7087 / 7092
页数:6
相关论文
共 50 条
  • [1] Learning Parameterized Prescription Policies and Disease Progression Dynamics using Markov Decision Processes
    Zhu, Henghui
    Xu, Tingting
    Paschalidis, Ioannis Ch
    [J]. 2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, : 3438 - 3443
  • [2] Learning Policies for Markov Decision Processes in Continuous Spaces
    Paternain, Santiago
    Bazerque, Juan Andres
    Small, Austin
    Ribeiro, Alejandro
    [J]. 2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 4751 - 4758
  • [3] Learning Policies for Markov Decision Processes From Data
    Hanawal, Manjesh Kumar
    Liu, Hao
    Zhu, Henghui
    Paschalidis, Ioannis Ch.
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2019, 64 (06) : 2298 - 2309
  • [4] Variance minimization of parameterized Markov decision processes
    Li Xia
    [J]. Discrete Event Dynamic Systems, 2018, 28 : 63 - 81
  • [5] Variance minimization of parameterized Markov decision processes
    Xia, Li
    [J]. DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS, 2018, 28 (01): : 63 - 81
  • [6] Learning deterministic policies in partially observable Markov decision processes
    Miyazaki, K
    Kobayashi, S
    [J]. INTELLIGENT AUTONOMOUS SYSTEMS: IAS-5, 1998, : 250 - 257
  • [7] On Markov policies for minimax decision processes
    Iwamoto, S
    Tsurusaki, K
    [J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2001, 253 (01) : 58 - 78
  • [8] Parameterized Penalties in the Dual Representation of Markov Decision Processes
    Ye, Fan
    Zhou, Enlu
    [J]. 2012 IEEE 51ST ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2012, : 870 - 876
  • [9] Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes
    Brazdil, Tomas
    Chatterjee, Krishnendu
    Novotny, Petr
    Vahala, Jiri
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9794 - 9801
  • [10] Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
    Roy, Arghyadip
    Borkar, Vivek
    Karandikar, Abhay
    Chaporkar, Prasanna
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (07) : 3722 - 3729