Learning Stochastic Optimal Policies via Gradient Descent

被引:3
|
作者
Massaroli, Stefano [1 ]
Poli, Michael [2 ]
Peluchetti, Stefano [3 ]
Park, Jinkyoo [2 ]
Yamashita, Atsushi [1 ]
Asama, Hajime [1 ]
机构
[1] Univ Tokyo, Dept Precis Engn, Tokyo 1138656, Japan
[2] Korea Adv Inst Sci & Technol, Dept Ind & Syst, Daejeon 305335, South Korea
[3] Cogent Labs, Daejeon 305701, South Korea
来源
基金
新加坡国家研究基金会;
关键词
Optimal control; Indium tin oxide; Stochastic processes; Process control; Optimization; Neural networks; Noise measurement; stochastic processes; machine learning; PORTFOLIO SELECTION; CONVERGENCE; ITO;
D O I
10.1109/LCSYS.2021.3086672
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We systematically develop a learning-based treatment of stochastic optimal control (SOC), relying on direct optimization of parametric control policies. We propose a derivation of adjoint sensitivity results for stochastic differential equations through direct application of variational calculus. Then, given an objective function for a predetermined task specifying the desiderata for the controller, we optimize their parameters via iterative gradient descent methods. In doing so, we extend the range of applicability of classical SOC techniques, often requiring strict assumptions on the functional form of system and control. We verify the performance of the proposed approach on a continuous-time, finite horizon portfolio optimization with proportional transaction costs.
引用
收藏
页码:1094 / 1099
页数:6
相关论文
共 50 条
  • [21] Online learning via congregational gradient descent
    Blackmore, RL
    Williamson, RC
    Mareels, IMY
    Sethares, WA
    MATHEMATICS OF CONTROL SIGNALS AND SYSTEMS, 1997, 10 (04) : 331 - 363
  • [22] Online learning via congregational gradient descent
    Kim L. Blackmore
    Robert C. Williamson
    Iven M. Y. Mareels
    William A. Sethares
    Mathematics of Control, Signals and Systems, 1997, 10 : 331 - 363
  • [23] OPTIMAL GRADIENT DESCENT LEARNING FOR BIDIRECTIONAL ASSOCIATIVE MEMORIES
    PERFETTI, R
    ELECTRONICS LETTERS, 1993, 29 (17) : 1556 - 1557
  • [24] Learning to learn by gradient descent by gradient descent
    Andrychowicz, Marcin
    Denil, Misha
    Colmenarejo, Sergio Gomez
    Hoffman, Matthew W.
    Pfau, David
    Schaul, Tom
    Shillingford, Brendan
    de Freitas, Nando
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [25] Generalization Bounds for Stochastic Gradient Descent via Localized ε-Covers
    Park, Sejun
    Simsekli, Umut
    Erdogdu, Murat A.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [26] Differentially private stochastic gradient descent via compression and memorization
    Phong, Le Trieu
    Phuong, Tran Thi
    JOURNAL OF SYSTEMS ARCHITECTURE, 2023, 135
  • [27] Robust and Fast Learning of Sparse Codes With Stochastic Gradient Descent
    Labusch, Kai
    Barth, Erhardt
    Martinetz, Thomas
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2011, 5 (05) : 1048 - 1060
  • [28] Large-Scale Machine Learning with Stochastic Gradient Descent
    Bottou, Leon
    COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 177 - 186
  • [29] Simple Stochastic and Online Gradient Descent Algorithms for Pairwise Learning
    Yang, Zhenhuan
    Lei, Yunwen
    Wang, Puyu
    Yang, Tianbao
    Ying, Yiming
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [30] Convergence diagnostics for stochastic gradient descent with constant learning rate
    Chee, Jerry
    Toulis, Panos
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84