When can the two-armed bandit algorithm be trusted?

被引:22
|
作者
Lamberton, D
Pagès, G
Tarrès, P
机构
[1] Univ Marne La Vallee, Lab Anal & Math Appl, UMR 8050, F-77454 Marne La Vallee 2, France
[2] Univ Paris 06, Lab Probabilites & Modelisat Aleatoire, UMR 7599, F-75252 Paris 5, France
[3] Univ Toulouse 3, Lab Stat & Probabil, CNRS, UMR C5583, F-31062 Toulouse 4, France
来源
ANNALS OF APPLIED PROBABILITY | 2004年 / 14卷 / 03期
关键词
two-armed bandit algorithm; stochastic approximation; learning automata; Polya urn; asset allocation;
D O I
10.1214/105051604000000350
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We investigate the asymptotic behavior of one Version of the so-called two-armed bandit algorithm. It is an example of stochastic approximation procedure whose associated ODE has both a repulsive and an attractive equilibrium, at which the procedure is noiseless. We show that if the gain parameter is constant or goes to 0 not too fast, the algorithm does fall in the noiseless repulsive equilibrium with positive probability, whereas it always converges to its natural attractive target when the gain parameter goes to zero at some appropriate rates depending on the parameters of the model. We also elucidate the behavior of the constant step algorithm when the step goes to 0. Finally, we highlight the connection between the algorithm and the Polya urn. An application to asset allocation is briefly described.
引用
收藏
页码:1424 / 1454
页数:31
相关论文
共 50 条
  • [41] Finding minimax strategy and minimax risk in a random environment (the two-armed bandit problem)
    A. V. Kolnogorov
    [J]. Automation and Remote Control, 2011, 72 : 1017 - 1027
  • [42] Basal ganglia preferentially encode context dependent choice in a two-armed bandit task
    Garenne, Andre
    Pasquereau, Benjamin
    Guthrie, Martin
    Bioulac, Bernard
    Boraud, Thomas
    [J]. FRONTIERS IN SYSTEMS NEUROSCIENCE, 2011, 5
  • [44] Optimal hysteresis for a class of deterministic deteriorating two-armed Bandit problem with switching costs
    Dusonchet, F
    Hongler, MO
    [J]. AUTOMATICA, 2003, 39 (11) : 1947 - 1955
  • [45] Decision-making without a brain: how an amoeboid organism solves the two-armed bandit
    Reid, Chris R.
    MacDonald, Hannelore
    Mann, Richard P.
    Marshall, James A. R.
    Latty, Tanya
    Garnier, Simon
    [J]. JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2016, 13 (119)
  • [46] A Two-Armed Bandit Collective for Examplar Based Mining of Frequent Itemsets with Applications to Intrusion Detection
    Haugland, Vegard
    Kjolleberg, Marius
    Larsen, Svein-Erik
    Granmo, Ole-Christoffer
    [J]. COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, PT I, 2011, 6922 : 72 - 81
  • [47] ON ERGODIC TWO-ARMED BANDITS
    Tarres, Pierre
    Vandekerkhove, Pierre
    [J]. ANNALS OF APPLIED PROBABILITY, 2012, 22 (02): : 457 - 476
  • [48] A two-armed bandit collective for hierarchical examplar based mining of frequent itemsets with applications to intrusion detection
    Haugland, Vegard
    Kjølleberg, Marius
    Larsen, Svein-Erik
    Granmo, Ole-Christoffer
    [J]. Granmo, Ole-Christoffer (ole.granmo@uia.no), 1600, Springer Verlag (8615):
  • [49] Development of a two-armed bipedal robot that can walk and carry objects
    Kanehiro, F
    Inaba, M
    Inoue, H
    [J]. IROS 96 - PROCEEDINGS OF THE 1996 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS - ROBOTIC INTELLIGENCE INTERACTING WITH DYNAMIC WORLDS, VOLS 1-3, 1996, : 23 - 28
  • [50] Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game
    Ole-Christoffer Granmo
    Sondre Glimsdal
    [J]. Applied Intelligence, 2013, 38 : 479 - 488