Hedging using reinforcement learning: Contextual k-armed bandit versus Q-learning

被引：3

作者：

Cannelli, Loris ^{[1
,4
]}

Nuti, Giuseppe ^{[2
]}

Sala, Marzio ^{[3
]}

Szehr, Oleg ^{[1
]}

机构：

[1] USI, Dalle Molle Inst Artificial Intelligence IDSIA, SUPSI, Lugano, Switzerland

[2] UBS Investment Bank, New York, NY USA

[3] UBS Investment Bank, Zurich, Switzerland

[4] Via La St 1, CH-6962 Lugano, Switzerland

来源：

JOURNAL OF FINANCE AND DATA SCIENCE | 2023年 / 9卷

关键词：

Hedging; Reinforcement Learning; Q; -Learning; Multi-Armed Bandits; GAME; GO;

D O I：

10.1016/j.jfds.2023.100101

中图分类号：

F8 [财政、金融];

学科分类号：

0202 ;

摘要：

The construction of replication strategies for contingent claims in the presence of risk and market friction is a key problem of financial engineering. In real markets, continuous replication, such as in the model of Black, Scholes and Merton (BSM), is not only unrealistic but is also undesirable due to high transaction costs. A variety of methods have been proposed to balance between effective replication and losses in the incomplete market setting. With the rise of Artificial Intelligence (AI), AI-based hedgers have attracted considerable interest, where particular attention is given to Recurrent Neural Network systems and variations of the Q-learning algorithm. From a practical point of view, sufficient samples for training such an AI can only be obtained from a simulator of the market environment. Yet if an agent is trained solely on simulated data, the run-time performance will primarily reflect the accuracy of the simulation, which leads to the classical problem of model choice and calibration. In this article, the hedging problem is viewed as an instance of a risk-averse contextual k-armed bandit problem, which is motivated by the simplicity and sampleefficiency of the architecture, which allows for realistic online model updates from real-world data. We find that the k-armed bandit model naturally fits to the Profit and Loss formulation of hedging, providing for a more accurate and sample efficient approach than Q-learning and reducing to the Black-Scholes model in the absence of transaction costs and risks.

引用

页数：22

共 50 条

[21] Constraints Penalized Q-learning for Safe Offline Reinforcement Learning
Xu, Haoran
Zhan, Xianyuan
Zhu, Xiangyu
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8753 - 8760
[22] Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach
Xu, Zhi-xiong
Cao, Lei
Chen, Xi-liang
Li, Chen-xi
Zhang, Yong-liang
Lai, Jun
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (09) : 2315 - 2322
[23] Swarm Reinforcement Learning Method Based on Hierarchical Q-Learning
Kuroe, Yasuaki
Takeuchi, Kenya
Maeda, Yutaka
2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
[24] Optimizing Q-Learning with K-FAC AlgorithmOptimizing Q-Learning with K-FAC Algorithm
Beltiukov, Roman
ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS (AIST 2019), 2020, 1086 : 3 - 8
[25] Simulating SQL injection vulnerability exploitation using Q-learning reinforcement learning agents
Erdodi, Laszlo
Sommervoll, Avald Aslaugson
Zennaro, Fabio Massimo
JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2021, 61
[26] A Hand Gesture Recognition System Using EMG and Reinforcement Learning: A Q-Learning Approach
Vasconez, Juan Pablo
Barona Lopez, Lorena Isabel
Valdivieso Caraguay, Angel Leonardo
Cruz, Patricio J.
Alvarez, Robin
Benalcazar, Marco E.
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 580 - 591
[27] Practical Online Reinforcement Learning for Microprocessors With Micro-Armed Bandit
Gerogiannis, Gerasimos
Torrellas, Josep
IEEE MICRO, 2024, 44 (04) : 80 - 87
[28] Enhanced Machine Learning Algorithms: Deep Learning, Reinforcement Learning, ana Q-Learning
Park, Ji Su
Park, Jong Hyuk
JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2020, 16 (05): : 1001 - 1007
[29] Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks
Ghazanfari, Behzad
Mozayani, Nasser
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 26 (06) : 2771 - 2783
[30] CONTEXTUAL MULTI-ARMED BANDIT ALGORITHMS FOR PERSONALIZED LEARNING ACTION SELECTION
Manickam, Indu
Lan, Andrew S.
Baraniuk, Richard G.
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 6344 - 6348

← 1 2 3 4 5 →