Exchangeability Characterizes Optimality of Sequential Normalized Maximum Likelihood and Bayesian Prediction

被引：1

作者：

Hedayati, Fares ^{[1
,2
,3
]}

Bartlett, Peter L. ^{[4
,5
,6
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

[2] Bahai Inst Higher Educ, Dept Comp Engn, Tehran 11369, Iran

[3] Upwork, San Francisco, CA 94107 USA

[4] Univ Calif Berkeley, Div Comp Sci, Berkeley, CA 94720 USA

[5] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA

[6] Queensland Univ Technol, Sch Math Sci, Brisbane, Qld 4000, Australia

来源：

IEEE TRANSACTIONS ON INFORMATION THEORY | 2017年 / 63卷 / 10期

基金：

澳大利亚研究理事会;

关键词：

Online learning; logarithmic loss; Bayesian strategy; Jeffreys prior; asymptotic normality of maximum likelihood estimator;

D O I：

10.1109/TIT.2017.2735799

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We study online learning under logarithmic loss with regular parametric models. In this setting, each strategy corresponds to a joint distribution on sequences. The minimax optimal strategy is the normalized maximum likelihood (NML) strategy. We show that the sequential NML (SNML) strategy predicts minimax optimally (i.e., as NML) if and only if the joint distribution on sequences defined by SNML is exchangeable. This property also characterizes the optimality of a Bayesian prediction strategy. In that case, the optimal prior distribution is Jeffreys prior for a broad class of parametric models for which the maximum likelihood estimator is asymptotically normal. The optimal prediction strategy, NML, depends on the number n of rounds of the game, in general. However, when a Bayesian strategy is optimal, NML becomes independent of n. Our proof uses this to exploit the asymptotics of NML. The asymptotic normality of the maximum likelihood estimator is responsible for the necessity of Jeffreys prior.

引用

页码：6767 / 6773

页数：7

共 50 条

[41] Bayesian and Maximum Likelihood Analyses of Rotifer–Acanthocephalan Relationships
David B. Mark Welch
[J]. Hydrobiologia, 2005, 546 : 47 - 54
[42] Maximum likelihood for Bayesian estimator based on α-stable for image
Huang, X
Madoc, AC
[J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : 709 - 712
[43] Bayesian and residual maximum likelihood statistical gene mapping
Hoeschele, I
[J]. ANIMAL BIOTECHNOLOGY, 1997, 8 (01) : 47 - 54
[44] Bayesian versus Maximum Likelihood Estimation in DSGE Modelling
Hudea , Oana Simona
[J]. ENTREPRENEURSHIP EDUCATION - A PRIORITY FOR THE HIGHER EDUCATION INSTITUTIONS, 2012, : 108 - 111
[45] WHEN IS MAXIMUM-LIKELIHOOD ESTIMATION OK FOR A BAYESIAN
GOOD, IJ
[J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1982, 15 (01) : 75 - 77
[46] Sequential Maximum Likelihood Estimation for the Hyperbolic Diffusion Process
Kuang, Nenghui
Xie, Huantian
[J]. METHODOLOGY AND COMPUTING IN APPLIED PROBABILITY, 2015, 17 (02) : 373 - 381
[47] Maximum likelihood Bayesian averaging of uncertain model predictions
S. P. Neuman
[J]. Stochastic Environmental Research and Risk Assessment, 2003, 17 : 291 - 305
[48] Procrustes meets theseus: maximum likelihood and Bayesian superpositions
Theobald, D. L.
[J]. FEBS JOURNAL, 2007, 274 : 257 - 257
[49] Maximum likelihood Bayesian averaging of uncertain model predictions
Neuman, SP
[J]. STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2003, 17 (05) : 291 - 305
[50] Learning and Inferences of the Bayesian Network with Maximum Likelihood Parameters
Zhang, JiaDong
Yue, Kun
Lin, WeiYi
[J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2008, 5139 : 391 - 399

← 1 2 3 4 5 →