Testing homogeneity in high dimensional data through random projections

被引:1
|
作者
Qiu, Tao [1 ]
Zhang, Qintong [2 ]
Fang, Yuanyuan [3 ]
Xu, Wangli [4 ,5 ]
机构
[1] Beijing Normal Univ, Ctr Stat & Data Sci, Zhuhai 519087, Peoples R China
[2] Beijing Normal Univ, Huitong Coll, Zhuhai 519087, Peoples R China
[3] Beijing Normal Univ, Sch Stat, Beijing 100875, Peoples R China
[4] Renmin Univ China, Ctr Appl Stat, Beijing 100872, Peoples R China
[5] Renmin Univ China, Sch Stat, Beijing 100872, Peoples R China
基金
中国国家自然科学基金;
关键词
Cramer-von Mises test; High dimension; High Random projections; Two-sample test; 2-SAMPLE TEST; MULTIVARIATE; DISTRIBUTIONS; DISTANCE; ROBUST;
D O I
10.1016/j.jmva.2023.105252
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Testing for homogeneity of two random vectors is a fundamental problem in statistics. In the past two decades, numerous efforts have been made to detect heterogeneity when the random vectors are multivariate or even high dimensional. Due to the "curse of dimensionality", existing tests based on Euclidean distance may fail to capture the overall homogeneity in high dimensional settings while can only capture the moment discrepancy. To address this issue, we propose a fully nonparametric test for homogeneity of two random vectors. Our method involves randomly selecting two subspaces consisting of components of the vectors, projecting the subspaces onto one-dimensional spaces, respectively, and constructing the test statistic using the Cramer-von Mises distance of the projections. To enhance the performance, we repeatedly implement this procedure to construct the final test statistic. Theoretically, if the replication time tends to infinity, we can avoid potential power loss caused by lousy directions. Owing to the U-statistic theory, the asymptotic null distribution of our proposed test is standard normal, regardless of the parent distributions of the random samples and the relationship between data dimensions and sample sizes. As a result, no re-sampling procedure is needed to determine critical values. The empirical size and power of the proposed test are demonstrated through numerical simulations.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] On conditional moments of high-dimensional random vectors given lower-dimensional projections
    Steinberger, Lukas
    Leeb, Hannes
    BERNOULLI, 2018, 24 (01) : 565 - 591
  • [32] Measuring the quality of projections of high-dimensional labeled data
    Benato, Barbara C.
    Falcao, Alexandre X.
    Telea, Alexandru C.
    COMPUTERS & GRAPHICS-UK, 2023, 116 : 287 - 297
  • [33] Guided Projections for Analyzing the Structure of High-Dimensional Data
    Ortner, Thomas
    Filzmoser, Peter
    Rohm, Maia
    Breiteneder, Christian
    Brodinova, Sarka
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2018, 27 (04) : 750 - 762
  • [34] Homogeneity and Sparsity Analysis for High-Dimensional Panel Data Models
    Wang, Wu
    Zhu, Zhongyi
    JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2024, 42 (01) : 26 - 35
  • [35] Random Projections and Sampling Algorithms for Clustering of High-Dimensional Polygonal Curves
    Meintrup, Stefan
    Munteanu, Alexander
    Rohde, Dennis
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [36] Large deviations for high-dimensional random projections of lpn-balls
    Alonso-Gutierrez, David
    Prochno, Joscha
    Thaele, Christoph
    ADVANCES IN APPLIED MATHEMATICS, 2018, 99 : 1 - 35
  • [37] Gaussian fluctuations for high-dimensional random projections of lpn-balls
    Alonso-Gutierrez, David
    Prochno, Joscha
    Thale, Toph
    BERNOULLI, 2019, 25 (4A) : 3139 - 3174
  • [38] Homogeneity tests of covariance matrices with high-dimensional longitudinal data
    Zhong, Ping-Shou
    Li, Runze
    Santo, Shawn
    BIOMETRIKA, 2019, 106 (03) : 619 - 634
  • [39] Distributed high dimensional information theoretical image registration via random projections
    Szabo, Zoltan
    Lorincz, Andras
    DIGITAL SIGNAL PROCESSING, 2012, 22 (06) : 894 - 902
  • [40] RANDOM PROJECTIONS AND HOTELLING'S T2 STATISTICS FOR CHANGE DETECTION IN HIGH-DIMENSIONAL DATA STREAMS
    Skubalska-Rafajlowicz, Ewa
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2013, 23 (02) : 447 - 461