Exchangeability Characterizes Optimality of Sequential Normalized Maximum Likelihood and Bayesian Prediction

被引:1
|
作者
Hedayati, Fares [1 ,2 ,3 ]
Bartlett, Peter L. [4 ,5 ,6 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Bahai Inst Higher Educ, Dept Comp Engn, Tehran 11369, Iran
[3] Upwork, San Francisco, CA 94107 USA
[4] Univ Calif Berkeley, Div Comp Sci, Berkeley, CA 94720 USA
[5] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[6] Queensland Univ Technol, Sch Math Sci, Brisbane, Qld 4000, Australia
基金
澳大利亚研究理事会;
关键词
Online learning; logarithmic loss; Bayesian strategy; Jeffreys prior; asymptotic normality of maximum likelihood estimator;
D O I
10.1109/TIT.2017.2735799
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study online learning under logarithmic loss with regular parametric models. In this setting, each strategy corresponds to a joint distribution on sequences. The minimax optimal strategy is the normalized maximum likelihood (NML) strategy. We show that the sequential NML (SNML) strategy predicts minimax optimally (i.e., as NML) if and only if the joint distribution on sequences defined by SNML is exchangeable. This property also characterizes the optimality of a Bayesian prediction strategy. In that case, the optimal prior distribution is Jeffreys prior for a broad class of parametric models for which the maximum likelihood estimator is asymptotically normal. The optimal prediction strategy, NML, depends on the number n of rounds of the game, in general. However, when a Bayesian strategy is optimal, NML becomes independent of n. Our proof uses this to exploit the asymptotics of NML. The asymptotic normality of the maximum likelihood estimator is responsible for the necessity of Jeffreys prior.
引用
收藏
页码:6767 / 6773
页数:7
相关论文
共 50 条
  • [1] Sequential Normalized Maximum Likelihood in Log-loss Prediction
    Kotlowski, Wojciech
    Grunwald, Peter
    [J]. 2012 IEEE INFORMATION THEORY WORKSHOP (ITW), 2012, : 547 - 551
  • [2] On the normalized maximum likelihood and Bayesian decision theory
    Karabatsos, George
    Walker, Stephen G.
    [J]. JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2006, 50 (06) : 517 - 520
  • [3] BAYESIAN-INFERENCE AND OPTIMALITY OF MAXIMUM LIKELIHOOD ESTIMATION
    HIGGINS, JJ
    [J]. INTERNATIONAL STATISTICAL REVIEW, 1977, 45 (01) : 9 - 11
  • [4] Bayesian Properties of Normalized Maximum Likelihood and its Fast Computation
    Barron, Andrew
    Roos, Teemu
    Watanabe, Kazuho
    [J]. 2014 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2014, : 1667 - 1671
  • [6] Quotient Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures
    Silander, Tomi
    Leppa-aho, Janne
    Jaasaari, Elias
    Roos, Teemu
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [7] Optimality of the maximum likelihood estimator in astrometry
    Espinosa, Sebastian
    Silva, Jorge F.
    Mendez, Rene A.
    Lobos, Rodrigo
    Orchard, Marcos
    [J]. ASTRONOMY & ASTROPHYSICS, 2018, 616
  • [9] Rejoinder on the normalized maximum likelihood and Bayesian decision theory: Reply to Grunwald and Navarro (2009)
    Karabatsos, George
    Walker, Stephen G.
    [J]. JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2009, 53 (01) : 52 - 52
  • [10] The Broad Optimality of Profile Maximum Likelihood
    Hao, Yi
    Orlitsky, Alon
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32