A New Graph-Based Two-Sample Test for Multivariate and Object Data

被引:69
|
作者
Chen, Hao [1 ]
Friedman, Jerome H. [2 ]
机构
[1] Univ Calif Davis, Dept Stat, 4218 Math Sci, Davis, CA 95616 USA
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
General alternatives; Nonparametrics; Permutation null distribution; Similarity graph; COVARIATE BALANCE; SMIRNOV; DISTRIBUTIONS; NETWORK; SAMPLE;
D O I
10.1080/01621459.2016.1147356
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Two-sample tests for multivariate data and especially for non-Euclidean data are not well explored. This article presents a novel test statistic based on a similarity graph constructed on the pooled observations from the two samples. It can be applied to multivariate data and non-Euclidean data as long as a dissimilarity measure on the sample space can be defined, which can usually be provided by domain experts. Existing tests based on a similarity graph lack power either for location or for scale alternatives. The new test uses a common pattern that was overlooked previously, and works for both types of alternatives. The test exhibits substantial power gains in simulation studies. Its asymptotic permutation null distribution is derived and shown to work well under finite samples, facilitating its application to large datasets. The new test is illustrated on two applications: The assessment of covariate balance in a matched observational study, and the comparison of network data under different conditions.
引用
收藏
页码:397 / 409
页数:13
相关论文
共 50 条
  • [1] GRAPH-BASED TWO-SAMPLE TESTS FOR DATA WITH REPEATED OBSERVATIONS
    Zhang, Jingru
    Chen, Hao
    STATISTICA SINICA, 2022, 32 (01) : 391 - 415
  • [2] GRAPH-BASED TESTS FOR TWO-SAMPLE COMPARISONS OF CATEGORICAL DATA
    Chen, Hao
    Zhang, Nancy R.
    STATISTICA SINICA, 2013, 23 (04) : 1479 - 1503
  • [3] On a new multivariate two-sample test
    Baringhaus, L
    Franz, C
    JOURNAL OF MULTIVARIATE ANALYSIS, 2004, 88 (01) : 190 - 206
  • [4] Weighted Graph-Based Two-Sample Test via Empirical Likelihood
    Zhao, Xiaofeng
    Yuan, Mingao
    MATHEMATICS, 2024, 12 (17)
  • [5] A Weighted Edge-Count Two-Sample Test for Multivariate and Object Data
    Chen, Hao
    Chen, Xu
    Su, Yi
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (523) : 1146 - 1155
  • [6] On some graph-based two-sample tests for high dimension, low sample size data
    Sarkar, Soham
    Biswas, Rahul
    Ghosh, Anil K.
    MACHINE LEARNING, 2020, 109 (02) : 279 - 306
  • [7] On some graph-based two-sample tests for high dimension, low sample size data
    Soham Sarkar
    Rahul Biswas
    Anil K. Ghosh
    Machine Learning, 2020, 109 : 279 - 306
  • [8] New test for the multivariate two-sample problem based on the concept of minimum energy
    Aslan, B
    Zech, G
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2005, 75 (02) : 109 - 119
  • [9] A Multivariate two-sample mean test for small sample size and missing data
    Wu, Yujun
    Genton, Marc G.
    Stefanski, Leonard A.
    BIOMETRICS, 2006, 62 (03) : 877 - 885
  • [10] Two-sample tests for multivariate functional data
    Jiang, Qing
    Meintanis, Simos G.
    Zhu, Lixing
    FUNCTIONAL STATISTICS AND RELATED FIELDS, 2017, : 145 - 154