Coverage-adjusted entropy estimation

被引:34
|
作者
Vu, Vincent Q. [1 ]
Yu, Bin
Kass, Robert E.
机构
[1] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[2] Carnegie Mellon Univ, Dept Stat, Pittsburgh, PA 15123 USA
[3] Carnegie Mellon Univ, Ctr Neural Basis Cognit, Pittsburgh, PA 15123 USA
关键词
entropy estimation; neuronal data; spike train;
D O I
10.1002/sim.2942
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Data on 'neural coding' have frequently been analyzed using information-theoretic measures. These formulations involve the fundamental and generally difficult statistical problem of estimating entropy. We review briefly several methods that have been advanced to estimate entropy and highlight a method, the coverage-adjusted entropy estimator (CAE), due to Chao and Shen that appeared recently in the environmental statistics literature. This method begins with the elementary Horvitz-Thompson estimator, developed for sampling from a finite population, and adjusts for the potential new species that have not yet been observed in the sample-these become the new patterns or 'words' in a spike train that have not yet been observed. The adjustment is due to I. J. Good, and is called the Good-Turing coverage estimate. We provide anew empirical regularization derivation of the coverage-adjusted probability estimator, which shrinks the maximum likelihood estimate. We prove that the CAE is consistent and first-order optimal, with rate Op (1/ log n), in the class of distributions with finite entropy variance and that, within the class of distributions with finite qth moment of the log-likelihood, the Good-Turing coverage estimate and the total probability of unobserved words converge at rate Op (1/ (log n)(q)). We then provide a simulation study of the estimator with standard distributions and examples from neuronal data, where observations are dependent. The results show that, with a minor modification, the CAE performs much better than the MLE and is better than the best upper bound estimator, due to Paninski, when the number of possible words m is unknown or infinite. Copyright (c) 2007 John Wiley & Sons, Ltd.
引用
收藏
页码:4039 / 4060
页数:22
相关论文
共 50 条