Hedging using reinforcement learning: Contextual k-armed bandit versus Q-learning

被引：3

作者：

Cannelli, Loris ^{[1
,4
]}

Nuti, Giuseppe ^{[2
]}

Sala, Marzio ^{[3
]}

Szehr, Oleg ^{[1
]}

机构：

[1] USI, Dalle Molle Inst Artificial Intelligence IDSIA, SUPSI, Lugano, Switzerland

[2] UBS Investment Bank, New York, NY USA

[3] UBS Investment Bank, Zurich, Switzerland

[4] Via La St 1, CH-6962 Lugano, Switzerland

来源：

JOURNAL OF FINANCE AND DATA SCIENCE | 2023年 / 9卷

关键词：

Hedging; Reinforcement Learning; Q; -Learning; Multi-Armed Bandits; GAME; GO;

D O I：

10.1016/j.jfds.2023.100101

中图分类号：

F8 [财政、金融];

学科分类号：

0202 ;

摘要：

The construction of replication strategies for contingent claims in the presence of risk and market friction is a key problem of financial engineering. In real markets, continuous replication, such as in the model of Black, Scholes and Merton (BSM), is not only unrealistic but is also undesirable due to high transaction costs. A variety of methods have been proposed to balance between effective replication and losses in the incomplete market setting. With the rise of Artificial Intelligence (AI), AI-based hedgers have attracted considerable interest, where particular attention is given to Recurrent Neural Network systems and variations of the Q-learning algorithm. From a practical point of view, sufficient samples for training such an AI can only be obtained from a simulator of the market environment. Yet if an agent is trained solely on simulated data, the run-time performance will primarily reflect the accuracy of the simulation, which leads to the classical problem of model choice and calibration. In this article, the hedging problem is viewed as an instance of a risk-averse contextual k-armed bandit problem, which is motivated by the simplicity and sampleefficiency of the architecture, which allows for realistic online model updates from real-world data. We find that the k-armed bandit model naturally fits to the Profit and Loss formulation of hedging, providing for a more accurate and sample efficient approach than Q-learning and reducing to the Black-Scholes model in the absence of transaction costs and risks.

引用

页数：22

共 50 条

[41] Emergency-Response Locomotion of Hexapod Robot with Heuristic Reinforcement Learning Using Q-Learning
Yang, Ming-Chieh
Samani, Hooman
Zhu, Kening
INTERACTIVE COLLABORATIVE ROBOTICS (ICR 2019), 2019, 11659 : 320 - 329
[42] Generating Learning Sequences Using Contextual Bandit Algorithms
Le Minh Duc Nguyen
Fuhua Lin
Chang, Maiga
GENERATIVE INTELLIGENCE AND INTELLIGENT TUTORING SYSTEMS, PT I, ITS 2024, 2024, 14798 : 320 - 329
[43] An Online Home Energy Management System using Q-Learning and Deep Q-Learning
Izmitligil, Hasan
Karamancioglu, Abdurrahman
SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2024, 43
[44] Reinforcement distribution in a team of cooperative Q-learning agents
Abbasi, Zahra
Abbasi, Mohammad Ali
PROCEEDINGS OF NINTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2008, : 154 - +
[45] The Sample Complexity of Teaching-by-Reinforcement on Q-Learning
Zhang, Xuezhou
Bharti, Shubham Kumar
Ma, Yuzhe
Singla, Adish
Zhu, Xiaojin
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10939 - 10947
[46] HEDGING BARRIER OPTIONS USING REINFORCEMENT LEARNING
Chen, Jacky
Fu, Yu
Hull, John
Poulos, Zissis
Wang, Zeyu
Yuan, Jun
JOURNAL OF INVESTMENT MANAGEMENT, 2024, 22 (04): : 16 - 25
[47] Reinforcement Learning for Automatic Parameter Tuning in Apache Spark: A Q-Learning Approach
Deng, Mei
Huang, Zirui
Ren, Zhigang
2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, : 13 - 18
[48] Deep Q-Learning Based Reinforcement Learning Approach for Network Intrusion Detection
Alavizadeh, Hooman
Alavizadeh, Hootan
Jang-Jaccard, Julian
COMPUTERS, 2022, 11 (03)
[49] Symmetric Q-learning: Reducing Skewness of Bellman Error in Online Reinforcement Learning
Omura, Motoki
Osa, Takayuki
Mukuta, Yusuke
Harada, Tatsuya
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14474 - 14481
[50] Pessimistic Q-Learning for Offline Reinforcement Learning: Towards Optimal Sample Complexity
Shi, Laixi
Li, Gen
Wei, Yuting
Chen, Yuxin
Chi, Yuejie
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,

← 1 2 3 4 5 →