ON ERGODIC TWO-ARMED BANDITS

被引:1
|
作者
Tarres, Pierre [1 ]
Vandekerkhove, Pierre [2 ]
机构
[1] Univ Toulouse, Inst Math, CNRS, F-31062 Toulouse 9, France
[2] Univ Paris Est, LAMA, F-77454 Champs Sur Marne 2, Marne La Vallee, France
来源
ANNALS OF APPLIED PROBABILITY | 2012年 / 22卷 / 02期
关键词
Convergence; ergodicity; stochastic algorithms; two-armed bandit; ALGORITHM; AUTOMATA;
D O I
10.1214/10-AAP751
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A device has two arms with unknown deterministic payoffs and the aim is to asymptotically identify the best one without spending too much time on the other. The Narendra algorithm offers a stochastic procedure to this end. We show under weak ergodic assumptions on these deterministic payoffs that the procedure eventually chooses the best arm (i.e., with greatest Cesaro limit) with probability one for appropriate step sequences of the algorithm. In the case of i.i.d. payoffs, this implies a "quenched" version of the "annealed" result of Lamberton, Pages and Tarres [Ann. Appl. Probab. 14 (2004) 1424-1454] by the law of iterated logarithm, thus generalizing it. More precisely, if (eta(l),i)(i is an element of N) is an element of {0, 1}(N), l is an element of {A, B}, are the deterministic reward sequences we would get if we played at time i, we obtain infallibility with the same assumption on nonincreasing step sequences on the payoffs as in Lamberton, Pages and Tarres [Ann. Appl. Probab. 14 (2004) 1424-1454], replacing the i.i.d. assumption by the hypothesis that the empirical averages Sigma(n)(i=1) eta(A,i)/n and Sigma(n)(i=1) eta(B,i)/n converge, as n tends to infinity, respectively, to theta(A) and theta(B), with rate at least 1/(log n)(1+epsilon), for some epsilon > 0. We also show a fallibility result, that is, convergence with positive probability to the choice of the wrong arm, which implies the corresponding result of Larnberton, Pages and Tarres [Ann. Appl. Probab. 14 (2004) 1424-1454] in the i.i.d. case.
引用
收藏
页码:457 / 476
页数:20
相关论文
共 50 条
  • [1] Good news and bad news in two-armed bandits
    Camargo, Braz
    [J]. JOURNAL OF ECONOMIC THEORY, 2007, 135 (01) : 558 - 566
  • [2] Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability
    Fryer, Roland
    Harms, Philipp
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2018, 43 (02) : 399 - 427
  • [3] Two-armed silicon
    Robert West
    [J]. Nature, 2012, 485 : 49 - 50
  • [4] The mechanics of the human body during two-armed weight stresses and two-armed weight lifting.
    Hebestreit, H
    [J]. PFLUGERS ARCHIV FUR DIE GESAMTE PHYSIOLOGIE DES MENSCHEN UND DER TIERE, 1934, 234 : 437 - 465
  • [5] A Bayesian two-armed bandit model
    Wang, Xikui
    Liang, You
    Porth, Lysa
    [J]. APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2019, 35 (03) : 624 - 636
  • [6] INORGANIC CHEMISTRY Two-armed silicon
    West, Robert
    [J]. NATURE, 2012, 485 (7396) : 49 - 50
  • [7] Poissonian Two-Armed Bandit: A New Approach
    A. V. Kolnogorov
    [J]. Problems of Information Transmission, 2022, 58 : 160 - 183
  • [8] A new two-armed colorimetric chemosensor for fluoride
    Wu, JL
    He, YB
    Wei, LH
    Meng, LZ
    Yang, TX
    Liu, X
    [J]. AUSTRALIAN JOURNAL OF CHEMISTRY, 2005, 58 (01) : 53 - 57
  • [9] Gaussian Two-Armed Bandit: Limiting Description
    Kolnogorov, A. V.
    [J]. PROBLEMS OF INFORMATION TRANSMISSION, 2020, 56 (03) : 278 - 301
  • [10] Gaussian Two-Armed Bandit: Limiting Description
    A. V. Kolnogorov
    [J]. Problems of Information Transmission, 2020, 56 : 278 - 301