High-Dimensional Covariance Estimation From a Small Number of Samples

被引:0
|
作者
Vishny, David [1 ]
Morzfeld, Matthias [1 ]
Gwirtz, Kyle [2 ]
Bach, Eviatar [3 ,4 ]
Dunbar, Oliver R. A. [3 ]
Hodyss, Daniel [5 ]
机构
[1] Univ Calif San Diego, Scripps Inst Oceanog, San Diego, CA 92093 USA
[2] NASA, Goddard Space Flight Ctr, Greenbelt, MD USA
[3] CALTECH, Pasadena, CA USA
[4] Univ Reading, Reading, England
[5] Naval Res Lab, Remote Sensing Div, Washington, DC USA
基金
美国国家科学基金会;
关键词
ENSEMBLE KALMAN FILTER; ELECTROMAGNETIC GEOPHYSICAL-DATA; VARIATIONAL DATA ASSIMILATION; UNCERTAINTY QUANTIFICATION; REGULARIZED INVERSION; LOCALIZATION APPROACH; POSTERIOR INFLATION; MATRICES; ERRORS; NWP;
D O I
10.1029/2024MS004417
中图分类号
P4 [大气科学(气象学)];
学科分类号
0706 ; 070601 ;
摘要
We synthesize knowledge from numerical weather prediction, inverse theory, and statistics to address the problem of estimating a high-dimensional covariance matrix from a small number of samples. This problem is fundamental in statistics, machine learning/artificial intelligence, and in modern Earth science. We create several new adaptive methods for high-dimensional covariance estimation, but one method, which we call Noise-Informed Covariance Estimation (NICE), stands out because it has three important properties: (a) NICE is conceptually simple and computationally efficient; (b) NICE guarantees symmetric positive semi-definite covariance estimates; and (c) NICE is largely tuning-free. We illustrate the use of NICE on a large set of Earth science-inspired numerical examples, including cycling data assimilation, inversion of geophysical field data, and training of feed-forward neural networks with time-averaged data from a chaotic dynamical system. Our theory, heuristics and numerical tests suggest that NICE may indeed be a viable option for high-dimensional covariance estimation in many Earth science problems. Models of physical processes must be fitted to real-world data before they are useful for prediction. In some cases, the most practical way to fit models to data is to run a set-or ensemble-of simulations with different physics or initial conditions. One then uses the covariances among the inputs and outputs to modify the simulations so that they fit the data better. To reduce noise in the covariances, one ideally uses an ensemble size that is larger than the number of unknown variables, but this becomes impractical when the number of unknowns is large. To improve the performance of this fitting process when the ensemble size is small, one can discount covariances between variables that are likely due to noise. We introduce several methods of covariance estimation that determine the degree to which covariances are discounted based on expected levels of noise. All new methods perform well on a series of Earth science-inspired problems, but we highlight one method that preserves a key property of covariance matrices at a low computational cost. We introduce several methods of covariance matrix estimation that adaptively select regularization parameters based on estimates of sampling error One method, Noise-Informed Covariance Estimation, stands out because it guarantees a positive semi-definite estimator at a low computational cost All new covariance estimation methods perform well on a large variety of test problems
引用
收藏
页数:30
相关论文
共 50 条
  • [21] Estimation of two high-dimensional covariance matrices and the spectrum of their ratio
    Wen, Jun
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2018, 168 : 1 - 29
  • [22] COVARIANCE AND PRECISION MATRIX ESTIMATION FOR HIGH-DIMENSIONAL TIME SERIES
    Chen, Xiaohui
    Xu, Mengyu
    Wu, Wei Biao
    [J]. ANNALS OF STATISTICS, 2013, 41 (06): : 2994 - 3021
  • [23] Optimal High-Dimensional Shrinkage Covariance Estimation for Elliptical Distributions
    Ollila, Esa
    [J]. 2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 1639 - 1643
  • [24] HIGH-DIMENSIONAL COVARIANCE MATRIX ESTIMATION IN APPROXIMATE FACTOR MODELS
    Fan, Jianqing
    Liao, Yuan
    Mincheva, Martina
    [J]. ANNALS OF STATISTICS, 2011, 39 (06): : 3320 - 3356
  • [25] TEST FOR BANDEDNESS OF HIGH-DIMENSIONAL COVARIANCE MATRICES AND BANDWIDTH ESTIMATION
    Qiu, Yumou
    Chen, Song Xi
    [J]. ANNALS OF STATISTICS, 2012, 40 (03): : 1285 - 1314
  • [26] A BLOCKING AND REGULARIZATION APPROACH TO HIGH-DIMENSIONAL REALIZED COVARIANCE ESTIMATION
    Hautsch, Nikolaus
    Kyj, Lada M.
    Oomen, Roel C. A.
    [J]. JOURNAL OF APPLIED ECONOMETRICS, 2012, 27 (04) : 625 - 645
  • [27] High-dimensional Covariance Estimation Based On Gaussian Graphical Models
    Zhou, Shuheng
    Ruetimann, Philipp
    Xu, Min
    Buehlmann, Peter
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2011, 12 : 2975 - 3026
  • [28] Adaptive banding covariance estimation for high-dimensional multivariate longitudinal data
    Qian, Fang
    Zhang, Weiping
    Chen, Yu
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2021, 49 (03): : 906 - 938
  • [29] Lower bound estimation for a family of high-dimensional sparse covariance matrices
    Li, Huimin
    Liu, Youming
    [J]. INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2024, 22 (02)
  • [30] Estimation and optimal structure selection of high-dimensional Toeplitz covariance matrix
    Yang, Yihe
    Zhou, Jie
    Pan, Jianxin
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2021, 184