Risk-sensitive and risk-neutral multiarmed bandits

被引:20
|
作者
Denardo, Eric V.
Park, Haechurl
Rothblum, Uriel G.
机构
[1] Yale Univ, Ctr Syst Sci, New Haven, CT 06520 USA
[2] Chung Ang Univ, Dept Business Adm, Seoul 156756, South Korea
[3] Technion Israel Inst Technol, Fac Ind Engn & Management, IL-32000 Haifa, Israel
关键词
multiarmed bandits; exponential utility; risk-sensitive Markov decision processes; optimal stopping;
D O I
10.1287/moor.1060.0240
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
For the multiarmed bandit, the classic result is probabilistic: each state of each bandit (Markov chain with rewards) has an index that is determined by an optimal stopping time for that state's bandit, and expected discounted income is maximized by playing at each epoch a bandit whose current state has the largest index. Our approach is analytic, not probabilistic. It uses pairwise comparison in place of stopping times. A simple recursion assigns to each state of each bandit a utility and an amplification of future utility that depend solely on the data for that state's bandit. These utilities and amplifications determine whether or not one state dominates another. We show that it is optimal to play at each epoch any bandit whose current state is not dominated by the current states of the other bandits. We obtain this result by a coherent analysis that encompasses three models-one with risk-averse exponential utility, one with risk-seeking exponential utility, and one with linear utility and either stopping or discounting. We also show that the risk-seeking case and a model of Nash [Nash, P. 1980. A generalized bandit problem. J. Roy. Statist. Soc. B 42 165-169) are equivalent to each other.
引用
下载
收藏
页码:374 / 394
页数:21
相关论文
共 50 条
  • [1] Cumulative Optimality in Risk-Sensitive and Risk-Neutral Markov Reward Chains
    Sladky, Karel
    MATHEMATICAL METHODS IN ECONOMICS 2013, PTS I AND II, 2013, : 814 - 819
  • [2] Risk-sensitive and risk-neutral control for continuous-time hidden Markov models
    James, MR
    Elliott, RJ
    APPLIED MATHEMATICS AND OPTIMIZATION, 1996, 34 (01): : 37 - 50
  • [3] Solutions of the average cost optimality equation for finite Markov decision chains: risk-sensitive and risk-neutral criteria
    Rolando Cavazos-Cadena
    Mathematical Methods of Operations Research, 2009, 70 : 541 - 566
  • [4] Solutions of the average cost optimality equation for finite Markov decision chains: risk-sensitive and risk-neutral criteria
    Cavazos-Cadena, Rolando
    MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2009, 70 (03) : 541 - 566
  • [5] Open Problem: Risk of Ruin in Multiarmed Bandits
    Perotto, Filipo Studzinski
    Bourgais, Mathieu
    Vercouter, Laurent
    da Silva, Bruno Castro
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [6] A Risk-Neutral Default for Chemical Risk Management
    Hansson, Sven Ove
    Ruden, Christina
    AMERICAN JOURNAL OF INDUSTRIAL MEDICINE, 2008, 51 (12) : 964 - 967
  • [7] Downside risk-neutral probabilities
    Chaigneau, Pierre
    Eeckhoudt, Louis
    ECONOMIC THEORY BULLETIN, 2020, 8 (01) : 65 - 77
  • [8] Risk-Neutral Densities: A Review
    Figlewski, Stephen
    ANNUAL REVIEW OF FINANCIAL ECONOMICS, VOL 10, 2018, 10 : 329 - 359
  • [9] RISK-SENSITIVE AND RISK NEUTRAL OPTIMALITY IN MARKOV DECISION CHAINS; A UNIFIED APPROACH
    Karel, Sladky
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE QUANTITATIVE METHODS IN ECONOMICS (MULTIPLE CRITERIA DECISION MAKING XVI), 2012, : 201 - 205
  • [10] Risk-neutral economy and zero price of risk
    Oldrich Alfons Vasicek
    Mathematics and Financial Economics, 2014, 8 : 229 - 239