Reinforcement learning with dynamic convex risk measures

被引:6
|
作者
Coache, Anthony [1 ]
Jaimungal, Sebastian [1 ,2 ]
机构
[1] Univ Toronto, Dept Stat Sci, Toronto, ON, Canada
[2] Univ Oxford, Oxford Man Inst, Oxford, England
基金
加拿大自然科学与工程研究理事会;
关键词
actor-critic algorithm; dynamic risk measures; financial hedging; policy gradient; reinforcement learning; robot control; time-consistency; trading strategies; APPROXIMATE; NETWORKS;
D O I
10.1111/mafi.12388
中图分类号
F8 [财政、金融];
学科分类号
0202 ;
摘要
We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor-critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.
引用
收藏
页码:557 / 587
页数:31
相关论文
共 50 条
  • [1] Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning
    Coache, Anthony
    Jaimungal, Sebastian
    Cartea, Alvaro
    [J]. SIAM JOURNAL ON FINANCIAL MATHEMATICS, 2023, 14 (04): : 1249 - 1289
  • [2] Valuations and dynamic convex risk measures
    Jobert, A.
    Rogers, L. C. G.
    [J]. MATHEMATICAL FINANCE, 2008, 18 (01) : 1 - 22
  • [3] Conditional and dynamic convex risk measures
    Detlefsen, K
    Scandolo, G
    [J]. FINANCE AND STOCHASTICS, 2005, 9 (04) : 539 - 561
  • [4] Conditional and dynamic convex risk measures
    Kai Detlefsen
    Giacomo Scandolo
    [J]. Finance and Stochastics, 2005, 9 : 539 - 561
  • [5] Regret Bounds for Risk-sensitive Reinforcement Learning with Lipschitz Dynamic Risk Measures
    Liang, Hao
    Luo, Zhi-Quan
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [6] OPTIMAL STOPPING FOR DYNAMIC CONVEX RISK MEASURES
    Bayraktar, Erhan
    Karatzas, Ioannis
    Yao, Song
    [J]. ILLINOIS JOURNAL OF MATHEMATICS, 2010, 54 (03) : 1025 - 1067
  • [7] Risk-Averse Reinforcement Learning via Dynamic Time-Consistent Risk Measures
    Yu, Xian
    Shen, Siqian
    [J]. 2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 2307 - 2312
  • [8] Deep reinforcement learning for option pricing and hedging under dynamic expectile risk measures
    Marzban, Saeed
    Delage, Erick
    Li, Jonathan Yu-Meng
    [J]. QUANTITATIVE FINANCE, 2023, 23 (10) : 1411 - 1430
  • [9] Deep reinforcement learning for option pricing and hedging under dynamic expectile risk measures
    Marzban, Saeed
    Delage, Erick
    Li, Jonathan Yu-Meng
    [J]. QUANTITATIVE FINANCE, 2021,
  • [10] A Framework for Dynamic Hedging under Convex Risk Measures
    Toussaint, Antoine
    Sircar, Ronnie
    [J]. SEMINAR ON STOCHASTIC ANALYSIS, RANDOM FIELDS AND APPLICATIONS VI, 2011, 63 : 429 - +