Coverage-adjusted entropy estimation

被引：34

作者：

Vu, Vincent Q. ^{[1
]}

Yu, Bin

Kass, Robert E.

机构：

[1] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA

[2] Carnegie Mellon Univ, Dept Stat, Pittsburgh, PA 15123 USA

[3] Carnegie Mellon Univ, Ctr Neural Basis Cognit, Pittsburgh, PA 15123 USA

来源：

STATISTICS IN MEDICINE | 2007年 / 26卷 / 21期

关键词：

entropy estimation; neuronal data; spike train;

D O I：

10.1002/sim.2942

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Data on 'neural coding' have frequently been analyzed using information-theoretic measures. These formulations involve the fundamental and generally difficult statistical problem of estimating entropy. We review briefly several methods that have been advanced to estimate entropy and highlight a method, the coverage-adjusted entropy estimator (CAE), due to Chao and Shen that appeared recently in the environmental statistics literature. This method begins with the elementary Horvitz-Thompson estimator, developed for sampling from a finite population, and adjusts for the potential new species that have not yet been observed in the sample-these become the new patterns or 'words' in a spike train that have not yet been observed. The adjustment is due to I. J. Good, and is called the Good-Turing coverage estimate. We provide anew empirical regularization derivation of the coverage-adjusted probability estimator, which shrinks the maximum likelihood estimate. We prove that the CAE is consistent and first-order optimal, with rate Op (1/ log n), in the class of distributions with finite entropy variance and that, within the class of distributions with finite qth moment of the log-likelihood, the Good-Turing coverage estimate and the total probability of unobserved words converge at rate Op (1/ (log n)(q)). We then provide a simulation study of the estimator with standard distributions and examples from neuronal data, where observations are dependent. The results show that, with a minor modification, the CAE performs much better than the MLE and is better than the best upper bound estimator, due to Paninski, when the number of possible words m is unknown or infinite. Copyright (c) 2007 John Wiley & Sons, Ltd.

引用

页码：4039 / 4060

页数：22

共 50 条

[1] Coverage-adjusted Confidence Intervals for a Binomial Proportion
Thulin, Mans
[J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2014, 41 (02) : 291 - 300
[2] Coverage-adjusted estimators for mark-recapture in heterogeneous populations
Ashbridge, J
Goudie, IBJ
[J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2000, 29 (04) : 1215 - 1237
[3] Fast Fault Coverage Estimation of Sequential Tests Using Entropy Measurements
Tanwir, Sarmad
Hsiao, Michael S.
[J]. 2018 IEEE 36TH VLSI TEST SYMPOSIUM (VTS 2018), 2018,
[4] Kernel adjusted density estimation
Srihera, Ramidha
Stute, Winfried
[J]. STATISTICS & PROBABILITY LETTERS, 2011, 81 (05) : 571 - 579
[5] ESTIMATION-ADJUSTED VAR
Gourieroux, Christian
Zakoian, Jean-Michel
[J]. ECONOMETRIC THEORY, 2013, 29 (04) : 735 - 770
[6] Adjusted estimation for the combination of classifiers
Mertens, BJA
Hand, DJ
[J]. ADVANCES IN INTELLIGENT DATA ANALYSIS, PROCEEDINGS, 1999, 1642 : 317 - 330
[7] ON THE ESTIMATION OF ENTROPY
HALL, P
MORTON, SC
[J]. ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1993, 45 (01) : 69 - 88
[8] ESTIMATION OF ENTROPY
HOOD, G
[J]. JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1953, 75 (09) : 2257 - 2258
[9] Length Estimation for the Adjusted Exponential Parameterization
Kozera, Ryszard
Noakes, Lyle
Rasinski, Mariusz
[J]. COMPUTER VISION AND GRAPHICS, 2012, 7594 : 139 - 147
[10] Nonresponse Adjusted Raking Ratio Estimation
Park, Mingue
[J]. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2015, 22 (06) : 655 - 664

← 1 2 3 4 5 →