Learning Best Response Strategies for Agents in Ad Exchanges

被引:0
|
作者
Gerakaris, Stavros [1 ]
Ramamoorthy, Subramanian [1 ]
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh, Scotland
来源
MULTI-AGENT SYSTEMS, EUMAS 2018 | 2019年 / 11450卷
关键词
Ad exchanges; Stochastic game; Censored observations; Harsanyi-Bellman Ad Hoc Coordination; Kaplan-Meier estimator;
D O I
10.1007/978-3-030-14174-5_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ad exchanges are widely used in platforms for online display advertising. Autonomous agents operating in these exchanges must learn policies for interacting profitably with a diverse, continually changing, but unknown market. We consider this problem from the perspective of a publisher, strategically interacting with an advertiser through a posted price mechanism. The learning problem for this agent is made difficult by the fact that information is censored, i.e., the publisher knows if an impression is sold but no other quantitative information. We address this problem using the Harsanyi-Bellman Ad Hoc Coordination (HBA) algorithm [1,3], which conceptualises this interaction in terms of a Stochastic Bayesian Game and arrives at optimal actions by best responding with respect to probabilistic beliefs maintained over a candidate set of opponent behaviour profiles. We adapt and apply HBA to the censored information setting of ad exchanges. Also, addressing the case of stochastic opponents, we devise a strategy based on a Kaplan-Meier estimator for opponent modelling. We evaluate the proposed method using simulations wherein we show that HBA-KM achieves substantially better competitive ratio and lower variance of return than baselines, including a Q-learning agent and a UCB-based online learning agent, and comparable to the offline optimal algorithm.
引用
收藏
页码:77 / 93
页数:17
相关论文
共 50 条
  • [41] Strategies for simulating pedestrian navigation with multiple reinforcement learning agents
    Francisco Martinez-Gil
    Miguel Lozano
    Fernando Fernández
    Autonomous Agents and Multi-Agent Systems, 2015, 29 : 98 - 130
  • [42] Passive learning of active causal strategies in agents and language models
    Lampinen, Andrew K.
    Chan, Stephanie C. Y.
    Dasgupta, Ishita
    Nam, Andrew J.
    Wang, Jane X.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [43] Strategies for simulating pedestrian navigation with multiple reinforcement learning agents
    Martinez-Gil, Francisco
    Lozano, Miguel
    Fernandez, Fernando
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2015, 29 (01) : 98 - 130
  • [44] Learning bidding strategies with autonomous agents in environments with unstable equilibrium
    Sikora, Riyaz T.
    Sachdev, Vishal
    DECISION SUPPORT SYSTEMS, 2008, 46 (01) : 101 - 114
  • [45] Pricing and Assortment Strategies with Product Exchanges
    Wagner, Laura
    Martinez-de-Albeniz, Victor
    OPERATIONS RESEARCH, 2020, 68 (02) : 453 - 466
  • [46] Best-response equilibrium: an equilibrium in finitely additive mixed strategies
    Milchtaich, Igal
    INTERNATIONAL JOURNAL OF GAME THEORY, 2023, 52 (04) : 1317 - 1334
  • [47] Best-response equilibrium: an equilibrium in finitely additive mixed strategies
    Igal Milchtaich
    International Journal of Game Theory, 2023, 52 : 1317 - 1334
  • [48] In Response to What Are the Best Management Strategies for Radiation-Induced Xerostomia?
    Hutchinson, Christoph T.
    Strome, Scott E.
    Suntharalingam, Mohan
    LARYNGOSCOPE, 2014, 124 (09): : E397 - E397
  • [49] Response Strategies and Learning in Discrete Choice Experiments
    Gabriela Scheufele
    Jeff Bennett
    Environmental and Resource Economics, 2012, 52 : 435 - 453
  • [50] LEARNING FROM CATASTROPHES: STRATEGIES FOR REACTION AND RESPONSE
    Birkland, Thomas A.
    PUBLIC ADMINISTRATION, 2011, 89 (03) : 1201 - U494