A Reward Shaping Approach for Reserve Price Optimization using Deep Reinforcement Learning

被引：0

作者：

Afshar, Reza Refaei ^{[1
]}

Rhuggenaath, Jason ^{[1
]}

Zhang, Yingqian ^{[1
]}

Kaymak, Uzay ^{[1
]}

机构：

[1] Eindhoven Univ Technol, Eindhoven, Netherlands

来源：

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年

关键词：

Real Time Bidding; Reinforcement Learning; Reward Shaping; Deep Learning;

D O I：

10.1109/IJCNN52387.2021.9533817

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Real Time Bidding is the process of selling and buying online advertisements in real time auctions. Real time auctions are performed in header bidding partners or ad exchanges to sell publishers' ad placements. Ad exchanges run second price auctions and a reserve price should be set for each ad placement or impression. This reserve price is normally determined by the bids of header bidding partners. However, ad exchange may outbid higher reserve prices and optimizing this value largely affects the revenue. In this paper, we propose a deep reinforcement learning approach for adjusting the reserve price of individual impressions using contextual information. Normally, ad exchanges do not return any information about the auction except the sold-unsold status. This binary feedback is not suitable for maximizing the revenue because it contains no explicit information about the revenue. In order to enrich the reward function, we develop a novel reward shaping approach to provide informative reward signal for the reinforcement learning agent. Based on this approach, different intervals of reserve price get different weights and the reward value of each interval is learned through a search procedure. Using a simulator, we test our method on a set of impressions. Results show superior performance of our proposed method in terms of revenue compared with the baselines.

引用

页数：8

共 50 条

[1] Hindsight Reward Shaping in Deep Reinforcement Learning
de Villiers, Byron
Sabatta, Deon
2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 653 - 659
[2] An Improvement on Mapless Navigation with Deep Reinforcement Learning: A Reward Shaping Approach
Alipanah, Arezoo
Moosavian, S. Ali A.
2022 10TH RSI INTERNATIONAL CONFERENCE ON ROBOTICS AND MECHATRONICS (ICROM), 2022, : 261 - 266
[3] Generalization in Deep Reinforcement Learning for Robotic Navigation by Reward Shaping
Miranda, Victor R. F.
Neto, Armando A.
Freitas, Gustavo M.
Mozelli, Leonardo A.
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2024, 71 (06) : 6013 - 6020
[4] Using Natural Language for Reward Shaping in Reinforcement Learning
Goyal, Prasoon
Niekum, Scott
Mooney, Raymond J.
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2385 - 2391
[5] Belief Reward Shaping in Reinforcement Learning
Marom, Ofir
Rosman, Benjamin
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3762 - 3769
[6] Reward Shaping in Episodic Reinforcement Learning
Grzes, Marek
AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 565 - 573
[7] Multigrid Reinforcement Learning with Reward Shaping
Grzes, Marek
Kudenko, Daniel
ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 357 - 366
[8] Offline reward shaping with scaling human preference feedback for deep reinforcement learning
Li, Jinfeng
Luo, Biao
Xu, Xiaodong
Huang, Tingwen
NEURAL NETWORKS, 2025, 181
[9] Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management
De Moor, Bram J.
Gijsbrechts, Joren
Boute, Robert N.
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2022, 301 (02) : 535 - 545
[10] Online VNF Placement using Deep Reinforcement Learning and Reward Constrained Policy Optimization
Mohamed, Ramy
Avgeris, Marios
Leivadeas, Aris
Lambadaris, Ioannis
2024 IEEE INTERNATIONAL MEDITERRANEAN CONFERENCE ON COMMUNICATIONS AND NETWORKING, MEDITCOM 2024, 2024, : 269 - 274

← 1 2 3 4 5 →