SEQUENTIAL MONTE CARLO BANDITS

被引:0
|
作者
Urteaga, Inigo [1 ,2 ]
Wiggins, Chris h. [3 ]
机构
[1] BCAM Basque Ctr Appl Math, Bilbao, Spain
[2] Basque Fdn Sci, Ikerbasque, Bilbao, Spain
[3] Columbia Univ, Dept Appl Phys & Appl Math, New York, NY USA
关键词
Sequential Monte Carlo; multi-armed bandits; restless bandits; linear dynamical systems; nonlinear reward functions; HIDDEN ARMA PROCESSES; PARAMETER-ESTIMATION; PARTICLE FILTERS; BAYESIAN-ESTIMATION; STATE; ALLOCATION;
D O I
10.3934/fods.2024005
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
. We extend state-of-the-art Bayesian multi-armed bandit (MAB) algorithms beyond their original setting by making use of sequential Monte Carlo (SMC) methods. A MAB is a sequential decision making problem where the goal is to learn a policy that maximizes long term payoff, where only the reward of the executed action is observed. In the stochastic MAB, the reward for each action is generated from an unknown distribution, often assumed to be stationary. To decide which action to take next, a MAB agent must learn the characteristics of the unknown reward distribution, e.g., compute its sufficient statistics. However, closed-form expressions for these statistics are analytically intractable except for simple, stationary cases. We here utilize SMC for estimation of the statistics Bayesian MAB agents compute, and devise flexible policies that can address a rich class of bandit problems: i.e., MABs with nonlinear, stateless- and context-dependent reward distributions that evolve over time. We showcase how non-stationary bandits, where time dynamics are modeled via linear dynamical systems, can be successfully addressed by SMC-based Bayesian bandit agents. We empirically demonstrate good regret performance of the proposed SMC-based bandit policies in several MAB scenarios that have remained elusive, i.e., in non-stationary bandits with nonlinear rewards.
引用
收藏
页数:57
相关论文
共 50 条
  • [1] Langevin Monte Carlo for Contextual Bandits
    Xu, Pan
    Zheng, Hongkai
    Mazumdar, Eric
    Azizzadenesheli, Kamyar
    Anandkumar, Anima
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [2] SEQUENTIAL MONTE CARLO
    HALTON, JH
    [J]. PROCEEDINGS OF THE CAMBRIDGE PHILOSOPHICAL SOCIETY, 1962, 58 (JAN): : 57 - &
  • [3] Sequential Monte Carlo samplers
    Del Moral, Pierre
    Doucet, Arnaud
    Jasra, Ajay
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2006, 68 : 411 - 436
  • [4] Kernel Sequential Monte Carlo
    Schuster, Ingmar
    Strathmann, Heiko
    Paige, Brooks
    Sejdinovic, Dino
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT I, 2017, 10534 : 390 - 409
  • [5] Elements of Sequential Monte Carlo
    Naesseth, Christian A.
    Lindsten, Fredrik
    Schon, Thomas B.
    [J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2019, 12 (03): : 187 - 306
  • [6] Sequential quasi Monte Carlo
    Gerber, Mathieu
    Chopin, Nicolas
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2015, 77 (03) : 509 - 579
  • [7] CONTROLLED SEQUENTIAL MONTE CARLO
    Heng, Jeremy
    Bishop, Adrian N.
    Deligiannidis, George
    Doucet, Arnaud
    [J]. ANNALS OF STATISTICS, 2020, 48 (05): : 2904 - 2929
  • [8] Sequential Monte Carlo with transformations
    Richard G. Everitt
    Richard Culliford
    Felipe Medina-Aguayo
    Daniel J. Wilson
    [J]. Statistics and Computing, 2020, 30 : 663 - 676
  • [9] Variational Sequential Monte Carlo
    Naesseth, Christian A.
    Linderman, Scott W.
    Ranganath, Rajesh
    Blei, David M.
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [10] Sequential Monte Carlo with transformations
    Everitt, Richard G.
    Culliford, Richard
    Medina-Aguayo, Felipe
    Wilson, Daniel J.
    [J]. STATISTICS AND COMPUTING, 2020, 30 (03) : 663 - 676