Learning Best Response Strategies for Agents in Ad Exchanges

被引:0
|
作者
Gerakaris, Stavros [1 ]
Ramamoorthy, Subramanian [1 ]
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh, Scotland
来源
关键词
Ad exchanges; Stochastic game; Censored observations; Harsanyi-Bellman Ad Hoc Coordination; Kaplan-Meier estimator;
D O I
10.1007/978-3-030-14174-5_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ad exchanges are widely used in platforms for online display advertising. Autonomous agents operating in these exchanges must learn policies for interacting profitably with a diverse, continually changing, but unknown market. We consider this problem from the perspective of a publisher, strategically interacting with an advertiser through a posted price mechanism. The learning problem for this agent is made difficult by the fact that information is censored, i.e., the publisher knows if an impression is sold but no other quantitative information. We address this problem using the Harsanyi-Bellman Ad Hoc Coordination (HBA) algorithm [1,3], which conceptualises this interaction in terms of a Stochastic Bayesian Game and arrives at optimal actions by best responding with respect to probabilistic beliefs maintained over a candidate set of opponent behaviour profiles. We adapt and apply HBA to the censored information setting of ad exchanges. Also, addressing the case of stochastic opponents, we devise a strategy based on a Kaplan-Meier estimator for opponent modelling. We evaluate the proposed method using simulations wherein we show that HBA-KM achieves substantially better competitive ratio and lower variance of return than baselines, including a Q-learning agent and a UCB-based online learning agent, and comparable to the offline optimal algorithm.
引用
收藏
页码:77 / 93
页数:17
相关论文
共 50 条
  • [21] Libraries Supporting Online Learning: Practical Strategies and Best Practices
    Sylvia, Margaret
    LIBRARY JOURNAL, 2021, 146 (02) : 98 - 98
  • [22] Libraries supporting online learning: practical strategies and best practices
    Kelly, Hope
    TECHNICAL SERVICES QUARTERLY, 2021, 38 (03) : 334 - 335
  • [23] Learning relay start strategies in swimming: What feedback is best?
    Fischer, Sebastian
    Braun, Claudia
    Kibele, Armin
    EUROPEAN JOURNAL OF SPORT SCIENCE, 2017, 17 (03) : 257 - 263
  • [24] A REVIEW OF LEARNING METHODS ENHANCED IN STRATEGIES OF NEGOTIATING AGENTS
    Masvoula, Marisa
    Kanellis, Panagiotis
    Martakos, Drakoulis
    ICEIS 2010: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 2: ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS, 2010, : 212 - 219
  • [25] Cross-Platform Measurement on Ad Exchanges
    Zhang, Chenyue
    Li, Chunxi
    Zhao, Yongxiang
    Huang, Nanxi
    Zhang, Baoxian
    2019 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW), 2019,
  • [26] Best strategies for reducing the suicide rate in Australia: Response to Pirkis
    Batterham, Philip J.
    Torok, Michelle
    Krysinska, Karolina
    Shand, Fiona
    Calear, Alison Louise
    Cockayne, Nicole
    Christensen, Helen M.
    AUSTRALIAN AND NEW ZEALAND JOURNAL OF PSYCHIATRY, 2016, 50 (04): : 386 - 386
  • [27] Double Best Response Dynamics in Topology Formation Game for Ad Hoc Networks
    Bazenkov, N. I.
    AUTOMATION AND REMOTE CONTROL, 2015, 76 (02) : 323 - 335
  • [28] Double best response dynamics in topology formation game for ad hoc networks
    N. I. Bazenkov
    Automation and Remote Control, 2015, 76 : 323 - 335
  • [29] Adaptive learning and p-best response sets
    Durieu, J.
    Solal, P.
    Tercieux, O.
    INTERNATIONAL JOURNAL OF GAME THEORY, 2011, 40 (04) : 735 - 747
  • [30] Regularized Bayesian best response learning in finite games
    Mukherjee, Sayan
    Roy, Souvik
    GAMES AND ECONOMIC BEHAVIOR, 2025, 149 : 1 - 31