ON ERGODIC TWO-ARMED BANDITS

被引：1

作者：

Tarres, Pierre ^{[1
]}

Vandekerkhove, Pierre ^{[2
]}

机构：

[1] Univ Toulouse, Inst Math, CNRS, F-31062 Toulouse 9, France

[2] Univ Paris Est, LAMA, F-77454 Champs Sur Marne 2, Marne La Vallee, France

来源：

ANNALS OF APPLIED PROBABILITY | 2012年 / 22卷 / 02期

关键词：

Convergence; ergodicity; stochastic algorithms; two-armed bandit; ALGORITHM; AUTOMATA;

D O I：

10.1214/10-AAP751

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

A device has two arms with unknown deterministic payoffs and the aim is to asymptotically identify the best one without spending too much time on the other. The Narendra algorithm offers a stochastic procedure to this end. We show under weak ergodic assumptions on these deterministic payoffs that the procedure eventually chooses the best arm (i.e., with greatest Cesaro limit) with probability one for appropriate step sequences of the algorithm. In the case of i.i.d. payoffs, this implies a "quenched" version of the "annealed" result of Lamberton, Pages and Tarres [Ann. Appl. Probab. 14 (2004) 1424-1454] by the law of iterated logarithm, thus generalizing it. More precisely, if (eta(l),i)(i is an element of N) is an element of {0, 1}(N), l is an element of {A, B}, are the deterministic reward sequences we would get if we played at time i, we obtain infallibility with the same assumption on nonincreasing step sequences on the payoffs as in Lamberton, Pages and Tarres [Ann. Appl. Probab. 14 (2004) 1424-1454], replacing the i.i.d. assumption by the hypothesis that the empirical averages Sigma(n)(i=1) eta(A,i)/n and Sigma(n)(i=1) eta(B,i)/n converge, as n tends to infinity, respectively, to theta(A) and theta(B), with rate at least 1/(log n)(1+epsilon), for some epsilon > 0. We also show a fallibility result, that is, convergence with positive probability to the choice of the wrong arm, which implies the corresponding result of Larnberton, Pages and Tarres [Ann. Appl. Probab. 14 (2004) 1424-1454] in the i.i.d. case.

引用

页码：457 / 476

页数：20

共 50 条

[1] Good news and bad news in two-armed bandits
Camargo, Braz
[J]. JOURNAL OF ECONOMIC THEORY, 2007, 135 (01) : 558 - 566
[2] Two-Armed Restless Bandits with Imperfect Information: Stochastic Control and Indexability
Fryer, Roland
Harms, Philipp
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 2018, 43 (02) : 399 - 427
[3] Two-armed silicon
Robert West
[J]. Nature, 2012, 485 : 49 - 50
[4] The mechanics of the human body during two-armed weight stresses and two-armed weight lifting.
Hebestreit, H
[J]. PFLUGERS ARCHIV FUR DIE GESAMTE PHYSIOLOGIE DES MENSCHEN UND DER TIERE, 1934, 234 : 437 - 465
[5] A Bayesian two-armed bandit model
Wang, Xikui
Liang, You
Porth, Lysa
[J]. APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2019, 35 (03) : 624 - 636
[6] INORGANIC CHEMISTRY Two-armed silicon
West, Robert
[J]. NATURE, 2012, 485 (7396) : 49 - 50
[7] Poissonian Two-Armed Bandit: A New Approach
A. V. Kolnogorov
[J]. Problems of Information Transmission, 2022, 58 : 160 - 183
[8] A new two-armed colorimetric chemosensor for fluoride
Wu, JL
He, YB
Wei, LH
Meng, LZ
Yang, TX
Liu, X
[J]. AUSTRALIAN JOURNAL OF CHEMISTRY, 2005, 58 (01) : 53 - 57
[9] Gaussian Two-Armed Bandit: Limiting Description
Kolnogorov, A. V.
[J]. PROBLEMS OF INFORMATION TRANSMISSION, 2020, 56 (03) : 278 - 301
[10] Gaussian Two-Armed Bandit: Limiting Description
A. V. Kolnogorov
[J]. Problems of Information Transmission, 2020, 56 : 278 - 301

← 1 2 3 4 5 →