Hedging using reinforcement learning: Contextual k-armed bandit versus Q-learning

被引:3
|
作者
Cannelli, Loris [1 ,4 ]
Nuti, Giuseppe [2 ]
Sala, Marzio [3 ]
Szehr, Oleg [1 ]
机构
[1] USI, Dalle Molle Inst Artificial Intelligence IDSIA, SUPSI, Lugano, Switzerland
[2] UBS Investment Bank, New York, NY USA
[3] UBS Investment Bank, Zurich, Switzerland
[4] Via La St 1, CH-6962 Lugano, Switzerland
来源
关键词
Hedging; Reinforcement Learning; Q; -Learning; Multi-Armed Bandits; GAME; GO;
D O I
10.1016/j.jfds.2023.100101
中图分类号
F8 [财政、金融];
学科分类号
0202 ;
摘要
The construction of replication strategies for contingent claims in the presence of risk and market friction is a key problem of financial engineering. In real markets, continuous replication, such as in the model of Black, Scholes and Merton (BSM), is not only unrealistic but is also undesirable due to high transaction costs. A variety of methods have been proposed to balance between effective replication and losses in the incomplete market setting. With the rise of Artificial Intelligence (AI), AI-based hedgers have attracted considerable interest, where particular attention is given to Recurrent Neural Network systems and variations of the Q-learning algorithm. From a practical point of view, sufficient samples for training such an AI can only be obtained from a simulator of the market environment. Yet if an agent is trained solely on simulated data, the run-time performance will primarily reflect the accuracy of the simulation, which leads to the classical problem of model choice and calibration. In this article, the hedging problem is viewed as an instance of a risk-averse contextual k-armed bandit problem, which is motivated by the simplicity and sampleefficiency of the architecture, which allows for realistic online model updates from real-world data. We find that the k-armed bandit model naturally fits to the Profit and Loss formulation of hedging, providing for a more accurate and sample efficient approach than Q-learning and reducing to the Black-Scholes model in the absence of transaction costs and risks.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] Constraints Penalized Q-learning for Safe Offline Reinforcement Learning
    Xu, Haoran
    Zhan, Xianyuan
    Zhu, Xiangyu
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8753 - 8760
  • [22] Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach
    Xu, Zhi-xiong
    Cao, Lei
    Chen, Xi-liang
    Li, Chen-xi
    Zhang, Yong-liang
    Lai, Jun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (09) : 2315 - 2322
  • [23] Swarm Reinforcement Learning Method Based on Hierarchical Q-Learning
    Kuroe, Yasuaki
    Takeuchi, Kenya
    Maeda, Yutaka
    2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [24] Optimizing Q-Learning with K-FAC AlgorithmOptimizing Q-Learning with K-FAC Algorithm
    Beltiukov, Roman
    ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS (AIST 2019), 2020, 1086 : 3 - 8
  • [25] Simulating SQL injection vulnerability exploitation using Q-learning reinforcement learning agents
    Erdodi, Laszlo
    Sommervoll, Avald Aslaugson
    Zennaro, Fabio Massimo
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2021, 61
  • [26] A Hand Gesture Recognition System Using EMG and Reinforcement Learning: A Q-Learning Approach
    Vasconez, Juan Pablo
    Barona Lopez, Lorena Isabel
    Valdivieso Caraguay, Angel Leonardo
    Cruz, Patricio J.
    Alvarez, Robin
    Benalcazar, Marco E.
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 580 - 591
  • [27] Practical Online Reinforcement Learning for Microprocessors With Micro-Armed Bandit
    Gerogiannis, Gerasimos
    Torrellas, Josep
    IEEE MICRO, 2024, 44 (04) : 80 - 87
  • [28] Enhanced Machine Learning Algorithms: Deep Learning, Reinforcement Learning, ana Q-Learning
    Park, Ji Su
    Park, Jong Hyuk
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2020, 16 (05): : 1001 - 1007
  • [29] Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks
    Ghazanfari, Behzad
    Mozayani, Nasser
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 26 (06) : 2771 - 2783
  • [30] CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION
    Manickam, Indu
    Lan, Andrew S.
    Baraniuk, Richard G.
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 6344 - 6348