Testing homogeneity in high dimensional data through random projections

被引:1
|
作者
Qiu, Tao [1 ]
Zhang, Qintong [2 ]
Fang, Yuanyuan [3 ]
Xu, Wangli [4 ,5 ]
机构
[1] Beijing Normal Univ, Ctr Stat & Data Sci, Zhuhai 519087, Peoples R China
[2] Beijing Normal Univ, Huitong Coll, Zhuhai 519087, Peoples R China
[3] Beijing Normal Univ, Sch Stat, Beijing 100875, Peoples R China
[4] Renmin Univ China, Ctr Appl Stat, Beijing 100872, Peoples R China
[5] Renmin Univ China, Sch Stat, Beijing 100872, Peoples R China
基金
中国国家自然科学基金;
关键词
Cramer-von Mises test; High dimension; High Random projections; Two-sample test; 2-SAMPLE TEST; MULTIVARIATE; DISTRIBUTIONS; DISTANCE; ROBUST;
D O I
10.1016/j.jmva.2023.105252
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Testing for homogeneity of two random vectors is a fundamental problem in statistics. In the past two decades, numerous efforts have been made to detect heterogeneity when the random vectors are multivariate or even high dimensional. Due to the "curse of dimensionality", existing tests based on Euclidean distance may fail to capture the overall homogeneity in high dimensional settings while can only capture the moment discrepancy. To address this issue, we propose a fully nonparametric test for homogeneity of two random vectors. Our method involves randomly selecting two subspaces consisting of components of the vectors, projecting the subspaces onto one-dimensional spaces, respectively, and constructing the test statistic using the Cramer-von Mises distance of the projections. To enhance the performance, we repeatedly implement this procedure to construct the final test statistic. Theoretically, if the replication time tends to infinity, we can avoid potential power loss caused by lousy directions. Owing to the U-statistic theory, the asymptotic null distribution of our proposed test is standard normal, regardless of the parent distributions of the random samples and the relationship between data dimensions and sample sizes. As a result, no re-sampling procedure is needed to determine critical values. The empirical size and power of the proposed test are demonstrated through numerical simulations.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Iterative random projections for high-dimensional data clustering
    Cardoso, Angelo
    Wichert, Andreas
    PATTERN RECOGNITION LETTERS, 2012, 33 (13) : 1749 - 1755
  • [2] Random projections versus random selection of features for classification of high dimensional data
    Mylavarapu, Sachin
    Kaban, Ata
    2013 13TH UK WORKSHOP ON COMPUTATIONAL INTELLIGENCE (UKCI), 2013, : 305 - 312
  • [3] Resistant estimates for high dimensional and functional data based on random projections
    Fraiman, Ricardo
    Svarc, Marcela
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2013, 58 : 326 - 338
  • [4] Testing high-dimensional covariance matrices with random projections and corrected likelihood ratio
    Sun, Nan
    Tang, Cheng Yong
    STATISTICS AND ITS INTERFACE, 2022, 15 (04) : 449 - 461
  • [5] Combining ELM with Random Projections for Low and High Dimensional Data Classification and Clustering
    Alshamiri, Abobakr Khalil
    Singh, Alok
    Surampudi, Bapi Raju
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON FUZZY AND NEURO COMPUTING (FANCCO - 2015), 2015, 415 : 89 - 107
  • [6] Exploring high-dimensional data through locally enhanced projections
    Lai, Chufan
    Zhao, Ying
    Yuan, Xiaoru
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2018, 48 : 144 - 156
  • [7] Visualizing High-dimensional single-cell RNA-sequencing data through multiple Random Projections
    Tasoulis, Sotiris K.
    Vrahatis, Aristidis G.
    Georgakopoulos, Spiros V.
    Plagianakos, Vassilis P.
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 5448 - 5450
  • [8] Optimal projections of high dimensional data
    Corchado, E
    Fyfe, C
    2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 589 - 596
  • [9] An efficient approach for feature construction of high-dimensional microarray data by random projections
    Tariq, Hassan
    Eldridge, Elf
    Welch, Ian
    PLOS ONE, 2018, 13 (04):
  • [10] TESTING HOMOGENEITY OF HIGH-DIMENSIONAL COVARIANCE MATRICES
    Zheng, Shurong
    Lin, Ruitao
    Guo, Jianhua
    Yin, Guosheng
    STATISTICA SINICA, 2020, 30 (01) : 35 - 53