SEQUENTIAL MONTE CARLO BANDITS

被引：0

作者：

Urteaga, Inigo ^{[1
,2
]}

Wiggins, Chris h. ^{[3
]}

机构：

[1] BCAM Basque Ctr Appl Math, Bilbao, Spain

[2] Basque Fdn Sci, Ikerbasque, Bilbao, Spain

[3] Columbia Univ, Dept Appl Phys & Appl Math, New York, NY USA

来源：

FOUNDATIONS OF DATA SCIENCE | 2024年

关键词：

Sequential Monte Carlo; multi-armed bandits; restless bandits; linear dynamical systems; nonlinear reward functions; HIDDEN ARMA PROCESSES; PARAMETER-ESTIMATION; PARTICLE FILTERS; BAYESIAN-ESTIMATION; STATE; ALLOCATION;

D O I：

10.3934/fods.2024005

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

. We extend state-of-the-art Bayesian multi-armed bandit (MAB) algorithms beyond their original setting by making use of sequential Monte Carlo (SMC) methods. A MAB is a sequential decision making problem where the goal is to learn a policy that maximizes long term payoff, where only the reward of the executed action is observed. In the stochastic MAB, the reward for each action is generated from an unknown distribution, often assumed to be stationary. To decide which action to take next, a MAB agent must learn the characteristics of the unknown reward distribution, e.g., compute its sufficient statistics. However, closed-form expressions for these statistics are analytically intractable except for simple, stationary cases. We here utilize SMC for estimation of the statistics Bayesian MAB agents compute, and devise flexible policies that can address a rich class of bandit problems: i.e., MABs with nonlinear, stateless- and context-dependent reward distributions that evolve over time. We showcase how non-stationary bandits, where time dynamics are modeled via linear dynamical systems, can be successfully addressed by SMC-based Bayesian bandit agents. We empirically demonstrate good regret performance of the proposed SMC-based bandit policies in several MAB scenarios that have remained elusive, i.e., in non-stationary bandits with nonlinear rewards.

引用

页数：57

共 50 条

[1] Langevin Monte Carlo for Contextual Bandits
Xu, Pan
Zheng, Hongkai
Mazumdar, Eric
Azizzadenesheli, Kamyar
Anandkumar, Anima
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[2] SEQUENTIAL MONTE CARLO
HALTON, JH
[J]. PROCEEDINGS OF THE CAMBRIDGE PHILOSOPHICAL SOCIETY, 1962, 58 (JAN): : 57 - &
[3] Sequential Monte Carlo samplers
Del Moral, Pierre
Doucet, Arnaud
Jasra, Ajay
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2006, 68 : 411 - 436
[4] Kernel Sequential Monte Carlo
Schuster, Ingmar
Strathmann, Heiko
Paige, Brooks
Sejdinovic, Dino
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT I, 2017, 10534 : 390 - 409
[5] Elements of Sequential Monte Carlo
Naesseth, Christian A.
Lindsten, Fredrik
Schon, Thomas B.
[J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2019, 12 (03): : 187 - 306
[6] Sequential quasi Monte Carlo
Gerber, Mathieu
Chopin, Nicolas
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2015, 77 (03) : 509 - 579
[7] CONTROLLED SEQUENTIAL MONTE CARLO
Heng, Jeremy
Bishop, Adrian N.
Deligiannidis, George
Doucet, Arnaud
[J]. ANNALS OF STATISTICS, 2020, 48 (05): : 2904 - 2929
[8] Sequential Monte Carlo with transformations
Richard G. Everitt
Richard Culliford
Felipe Medina-Aguayo
Daniel J. Wilson
[J]. Statistics and Computing, 2020, 30 : 663 - 676
[9] Variational Sequential Monte Carlo
Naesseth, Christian A.
Linderman, Scott W.
Ranganath, Rajesh
Blei, David M.
[J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
[10] Sequential Monte Carlo with transformations
Everitt, Richard G.
Culliford, Richard
Medina-Aguayo, Felipe
Wilson, Daniel J.
[J]. STATISTICS AND COMPUTING, 2020, 30 (03) : 663 - 676

← 1 2 3 4 5 →