High-Dimensional Covariance Estimation From a Small Number of Samples

被引:0
|
作者
Vishny, David [1 ]
Morzfeld, Matthias [1 ]
Gwirtz, Kyle [2 ]
Bach, Eviatar [3 ,4 ]
Dunbar, Oliver R. A. [3 ]
Hodyss, Daniel [5 ]
机构
[1] Univ Calif San Diego, Scripps Inst Oceanog, San Diego, CA 92093 USA
[2] NASA, Goddard Space Flight Ctr, Greenbelt, MD USA
[3] CALTECH, Pasadena, CA USA
[4] Univ Reading, Reading, England
[5] Naval Res Lab, Remote Sensing Div, Washington, DC USA
基金
美国国家科学基金会;
关键词
ENSEMBLE KALMAN FILTER; ELECTROMAGNETIC GEOPHYSICAL-DATA; VARIATIONAL DATA ASSIMILATION; UNCERTAINTY QUANTIFICATION; REGULARIZED INVERSION; LOCALIZATION APPROACH; POSTERIOR INFLATION; MATRICES; ERRORS; NWP;
D O I
10.1029/2024MS004417
中图分类号
P4 [大气科学(气象学)];
学科分类号
0706 ; 070601 ;
摘要
We synthesize knowledge from numerical weather prediction, inverse theory, and statistics to address the problem of estimating a high-dimensional covariance matrix from a small number of samples. This problem is fundamental in statistics, machine learning/artificial intelligence, and in modern Earth science. We create several new adaptive methods for high-dimensional covariance estimation, but one method, which we call Noise-Informed Covariance Estimation (NICE), stands out because it has three important properties: (a) NICE is conceptually simple and computationally efficient; (b) NICE guarantees symmetric positive semi-definite covariance estimates; and (c) NICE is largely tuning-free. We illustrate the use of NICE on a large set of Earth science-inspired numerical examples, including cycling data assimilation, inversion of geophysical field data, and training of feed-forward neural networks with time-averaged data from a chaotic dynamical system. Our theory, heuristics and numerical tests suggest that NICE may indeed be a viable option for high-dimensional covariance estimation in many Earth science problems. Models of physical processes must be fitted to real-world data before they are useful for prediction. In some cases, the most practical way to fit models to data is to run a set-or ensemble-of simulations with different physics or initial conditions. One then uses the covariances among the inputs and outputs to modify the simulations so that they fit the data better. To reduce noise in the covariances, one ideally uses an ensemble size that is larger than the number of unknown variables, but this becomes impractical when the number of unknowns is large. To improve the performance of this fitting process when the ensemble size is small, one can discount covariances between variables that are likely due to noise. We introduce several methods of covariance estimation that determine the degree to which covariances are discounted based on expected levels of noise. All new methods perform well on a series of Earth science-inspired problems, but we highlight one method that preserves a key property of covariance matrices at a low computational cost. We introduce several methods of covariance matrix estimation that adaptively select regularization parameters based on estimates of sampling error One method, Noise-Informed Covariance Estimation, stands out because it guarantees a positive semi-definite estimator at a low computational cost All new covariance estimation methods perform well on a large variety of test problems
引用
收藏
页数:30
相关论文
共 50 条
  • [1] High-dimensional covariance matrix estimation
    Lam, Clifford
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2020, 12 (02)
  • [2] Ridge estimation of inverse covariance matrices from high-dimensional data
    van Wieringen, Wessel N.
    Peeters, Carel F. W.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2016, 103 : 284 - 303
  • [3] High-dimensional covariance estimation under the presence of outliers
    Huang, Hsin-Cheng
    Lee, Thomas C. M.
    [J]. STATISTICS AND ITS INTERFACE, 2016, 9 (04) : 461 - 468
  • [4] High-dimensional covariance matrix estimation with missing observations
    Lounici, Karim
    [J]. BERNOULLI, 2014, 20 (03) : 1029 - 1058
  • [5] Robust estimation of high-dimensional covariance and precision matrices
    Avella-Medina, Marco
    Battey, Heather S.
    Fan, Jianqing
    Li, Quefeng
    [J]. BIOMETRIKA, 2018, 105 (02) : 271 - 284
  • [6] Robust estimation of a high-dimensional integrated covariance matrix
    Morimoto, Takayuki
    Nagata, Shuichi
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2017, 46 (02) : 1102 - 1112
  • [7] Sparse covariance matrix estimation in high-dimensional deconvolution
    Belomestny, Denis
    Trabs, Mathias
    Tsybakov, Alexandre B.
    [J]. BERNOULLI, 2019, 25 (03) : 1901 - 1938
  • [8] Bandwidth Selection for High-Dimensional Covariance Matrix Estimation
    Qiu, Yumou
    Chen, Song Xi
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2015, 110 (511) : 1160 - 1174
  • [9] High-dimensional realized covariance estimation: a parametric approach
    Buccheri, G.
    Anga, G. Mboussa
    [J]. QUANTITATIVE FINANCE, 2022, 22 (11) : 2093 - 2107
  • [10] Fast covariance estimation for high-dimensional functional data
    Luo Xiao
    Vadim Zipunnikov
    David Ruppert
    Ciprian Crainiceanu
    [J]. Statistics and Computing, 2016, 26 : 409 - 421