Approximate Newton methods for policy search in markov decision processes

被引：0

作者：

机构：

[1] Furmston, Thomas

[2] Lever, Guy

[3] Barber, David

来源：

| 1600年 / Microtome Publishing卷 / 17期

关键词：

Inverse problems - Least squares approximations - Gaussian distribution - Newton-Raphson method - Markov processes;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Approximate Newton methods are standard optimization tools which aim to maintain the benefits of Newton's method, such as a fast rate of convergence, while alleviating its drawbacks, such as computationally expensive calculation or estimation of the inverse Hessian. In this work we investigate approximate Newton methods for policy optimization in Markov decision processes (MDPs). We first analyse the structure of the Hessian of the total expected reward, which is a standard objective function for MDPs. We show that, like the gradient, the Hessian exhibits useful structure in the context of MDPs and we use this analysis to motivate two Gauss-Newton methods for MDPs. Like the Gauss-Newton method for non-linear least squares, these methods drop certain terms in the Hessian. The approximate Hessians possess desirable properties, such as negative definiteness, and we demonstrate several important performance guarantees including guaranteed ascent directions, invariance to afine transformation of the parameter space and convergence guarantees. We finally provide a unifying perspective of key policy search algorithms, demonstrating that our second Gauss-Newton algorithm is closely related to both the EMalgorithm and natural gradient ascent applied to MDPs, but performs significantly better in practice on a range of challenging domains. © 2016 Thomas Furmston, Guy Lever, and David Barber.

引用

共 50 条

[1] Approximate Newton Methods for Policy Search in Markov Decision Processes
Furmston, Thomas
Lever, Guy
Barber, David
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
[2] Approximate solutions to semi Markov decision processes through Markov Chain Montecarlo methods
Moreno-Díaz, A
Virto, MA
Martín, J
Insua, DR
[J]. COMPUTER AIDED SYSTEMS THEORY - EUROCAST 2003, 2003, 2809 : 151 - 162
[3] Approximate policy iteration with a policy language bias: Solving relational markov decision processes
Fern, Alan
Yoon, Sungwook
Givan, Robert
[J]. Journal of Artificial Intelligence Research, 1600, 25 : 75 - 118
[4] Approximate policy iteration with a policy language bias: Solving relational Markov decision processes
Fern, A
Yoon, S
Givan, R
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2006, 25 : 75 - 118
[5] POLICY ITERATION AND NEWTON-RAPHSON METHODS FOR MARKOV DECISION-PROCESSES UNDER AVERAGE COST CRITERION
OHNISHI, M
[J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 1992, 24 (1-2) : 147 - 155
[6] Approximate equivalence of Markov decision processes
Even-Dar, E
Mansour, Y
[J]. LEARNING THEORY AND KERNEL MACHINES, 2003, 2777 : 581 - 594
[7] VERIFICATION OF GENERAL MARKOV DECISION PROCESSES BY APPROXIMATE SIMILARITY RELATIONS AND POLICY REFINEMENT
Haesaert, Sofie
Soudjani, Sadegh Esmaeil Zadeh
Abate, Alessandro
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2017, 55 (04) : 2333 - 2367
[8] Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations
Abate, Alessandro
Ceska, Milan
Kwiatkowska, Marta
[J]. AUTOMATED TECHNOLOGY FOR VERIFICATION AND ANALYSIS, ATVA 2016, 2016, 9938 : 13 - 31
[9] Verification of General Markov Decision Processes by Approximate Similarity Relations and Policy Refinement
Haesaert, Sofie
Abate, Alessandro
Van den Hof, Paul M. J.
[J]. QUANTITATIVE EVALUATION OF SYSTEMS, QEST 2016, 2016, 9826 : 227 - 243
[10] Temporal logic control of general Markov decision processes by approximate policy refinement
Haesaert, Sofie
Soudjani, Sadegh
Abate, Alessandro
[J]. IFAC PAPERSONLINE, 2018, 51 (16): : 73 - 78

← 1 2 3 4 5 →