ON ERGODIC TWO-ARMED BANDITS

被引:1
|
作者
Tarres, Pierre [1 ]
Vandekerkhove, Pierre [2 ]
机构
[1] Univ Toulouse, Inst Math, CNRS, F-31062 Toulouse 9, France
[2] Univ Paris Est, LAMA, F-77454 Champs Sur Marne 2, Marne La Vallee, France
来源
ANNALS OF APPLIED PROBABILITY | 2012年 / 22卷 / 02期
关键词
Convergence; ergodicity; stochastic algorithms; two-armed bandit; ALGORITHM; AUTOMATA;
D O I
10.1214/10-AAP751
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A device has two arms with unknown deterministic payoffs and the aim is to asymptotically identify the best one without spending too much time on the other. The Narendra algorithm offers a stochastic procedure to this end. We show under weak ergodic assumptions on these deterministic payoffs that the procedure eventually chooses the best arm (i.e., with greatest Cesaro limit) with probability one for appropriate step sequences of the algorithm. In the case of i.i.d. payoffs, this implies a "quenched" version of the "annealed" result of Lamberton, Pages and Tarres [Ann. Appl. Probab. 14 (2004) 1424-1454] by the law of iterated logarithm, thus generalizing it. More precisely, if (eta(l),i)(i is an element of N) is an element of {0, 1}(N), l is an element of {A, B}, are the deterministic reward sequences we would get if we played at time i, we obtain infallibility with the same assumption on nonincreasing step sequences on the payoffs as in Lamberton, Pages and Tarres [Ann. Appl. Probab. 14 (2004) 1424-1454], replacing the i.i.d. assumption by the hypothesis that the empirical averages Sigma(n)(i=1) eta(A,i)/n and Sigma(n)(i=1) eta(B,i)/n converge, as n tends to infinity, respectively, to theta(A) and theta(B), with rate at least 1/(log n)(1+epsilon), for some epsilon > 0. We also show a fallibility result, that is, convergence with positive probability to the choice of the wrong arm, which implies the corresponding result of Larnberton, Pages and Tarres [Ann. Appl. Probab. 14 (2004) 1424-1454] in the i.i.d. case.
引用
收藏
页码:457 / 476
页数:20
相关论文
共 50 条
  • [41] A Finite Memory Automaton for Two-Armed Bernoulli Bandit Problems
    Rao, Ariel
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4981 - 4982
  • [42] A Two-Armed Probe for In-Cell DEER Measurements on Proteins**
    Miao, Qing
    Zurlo, Enrico
    de Bruin, Donny
    Wondergem, Joeri A. J.
    Timmer, Monika
    Blok, Anneloes
    Heinrich, Doris
    Overhand, Mark
    Huber, Martina
    Ubbink, Marcellus
    [J]. CHEMISTRY-A EUROPEAN JOURNAL, 2020, 26 (71) : 17128 - 17133
  • [43] A two-armed intelligent robot assembles mini robots automatically
    Sakakibara, S
    [J]. PROCEEDINGS OF THE 1996 IEEE IECON - 22ND INTERNATIONAL CONFERENCE ON INDUSTRIAL ELECTRONICS, CONTROL, AND INSTRUMENTATION, VOLS 1-3, 1996, : 1879 - 1883
  • [44] Trapped, Two-Armed, Nearly Vertical Oscillations in Polytropic Disks
    Kato, Shoji
    [J]. PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF JAPAN, 2010, 62 (03) : 635 - 643
  • [45] Strategic two-sample test via the two-armed bandit process
    Chen, Zengjing
    Yan, Xiaodong
    Zhang, Guodong
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2023, 85 (04) : 1271 - 1298
  • [46] Realization of a Given Relative Motion of two Rigid Bodies by a Two-Armed Robot
    E. I. Vorobiev
    [J]. Mechanics of Solids, 2018, 53 : 221 - 227
  • [47] Finite-time lower bounds for the two-armed bandit problem
    Kulkarni, SR
    Lugosi, G
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2000, 45 (04) : 711 - 714
  • [48] Inference on effect size indices from several two-armed experiments
    Sana S. BuHamra
    Noriah M. Al-Kandari
    S. E. Ahmed
    [J]. Statistical Papers, 2010, 51 : 775 - 787
  • [49] Testing hypotheses on coefficients of variation from a series of two-armed experiments
    Argaç, D
    [J]. JOURNAL OF APPLIED STATISTICS, 2005, 32 (04) : 409 - 419
  • [50] A compact two-armed slave manipulator for minimally invasive surgery of the throat
    Wei, Wei
    Xu, Kai
    Simaan, Nabil
    [J]. 2006 1ST IEEE RAS-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL ROBOTICS AND BIOMECHATRONICS, VOLS 1-3, 2006, : 1190 - +