Hedging using reinforcement learning: Contextual k-armed bandit versus Q-learning

被引:3
|
作者
Cannelli, Loris [1 ,4 ]
Nuti, Giuseppe [2 ]
Sala, Marzio [3 ]
Szehr, Oleg [1 ]
机构
[1] USI, Dalle Molle Inst Artificial Intelligence IDSIA, SUPSI, Lugano, Switzerland
[2] UBS Investment Bank, New York, NY USA
[3] UBS Investment Bank, Zurich, Switzerland
[4] Via La St 1, CH-6962 Lugano, Switzerland
来源
关键词
Hedging; Reinforcement Learning; Q; -Learning; Multi-Armed Bandits; GAME; GO;
D O I
10.1016/j.jfds.2023.100101
中图分类号
F8 [财政、金融];
学科分类号
0202 ;
摘要
The construction of replication strategies for contingent claims in the presence of risk and market friction is a key problem of financial engineering. In real markets, continuous replication, such as in the model of Black, Scholes and Merton (BSM), is not only unrealistic but is also undesirable due to high transaction costs. A variety of methods have been proposed to balance between effective replication and losses in the incomplete market setting. With the rise of Artificial Intelligence (AI), AI-based hedgers have attracted considerable interest, where particular attention is given to Recurrent Neural Network systems and variations of the Q-learning algorithm. From a practical point of view, sufficient samples for training such an AI can only be obtained from a simulator of the market environment. Yet if an agent is trained solely on simulated data, the run-time performance will primarily reflect the accuracy of the simulation, which leads to the classical problem of model choice and calibration. In this article, the hedging problem is viewed as an instance of a risk-averse contextual k-armed bandit problem, which is motivated by the simplicity and sampleefficiency of the architecture, which allows for realistic online model updates from real-world data. We find that the k-armed bandit model naturally fits to the Profit and Loss formulation of hedging, providing for a more accurate and sample efficient approach than Q-learning and reducing to the Black-Scholes model in the absence of transaction costs and risks.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] LEARNING TO PLAY K-ARMED BANDIT PROBLEMS
    Maes, Francis
    Wehenkel, Louis
    Ernst, Damien
    ICAART: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2012, : 74 - 81
  • [2] Materials Discovery using Max K-Armed Bandit
    Kikkawa, Nobuaki
    Ohno, Hiroshi
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 40
  • [3] Contextual Q-Learning
    Pinto, Tiago
    Vale, Zita
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2927 - 2928
  • [4] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [5] Fuzzy Q-Learning for generalization of reinforcement learning
    Berenji, HR
    FUZZ-IEEE '96 - PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, 1996, : 2208 - 2214
  • [6] Deep Reinforcement Learning with Double Q-Learning
    van Hasselt, Hado
    Guez, Arthur
    Silver, David
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2094 - 2100
  • [7] Reinforcement learning guidance law of Q-learning
    Zhang Q.
    Ao B.
    Zhang Q.
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2020, 42 (02): : 414 - 419
  • [8] Choosing optimal seller based on off-line learning negotiation history and K-armed bandit problem
    Wang, LM
    Chav, YM
    Huang, HK
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 155 - 160
  • [9] Multi-Agent Reinforcement Learning - An Exploration Using Q-Learning
    Graham, Caoimhin
    Bell, David
    Luo, Zhihui
    RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XXVI: INCORPORATING APPLICATIONS AND INNOVATIONS IN INTELLIGENT SYSTEMS XVII, 2010, : 293 - 298
  • [10] Autonomous Driving in Roundabout Maneuvers Using Reinforcement Learning with Q-Learning
    Garcia Cuenca, Laura
    Puertas, Enrique
    Fernandez Andres, Javier
    Aliane, Nourdine
    ELECTRONICS, 2019, 8 (12)