A New Graph-Based Two-Sample Test for Multivariate and Object Data

被引:69
|
作者
Chen, Hao [1 ]
Friedman, Jerome H. [2 ]
机构
[1] Univ Calif Davis, Dept Stat, 4218 Math Sci, Davis, CA 95616 USA
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
General alternatives; Nonparametrics; Permutation null distribution; Similarity graph; COVARIATE BALANCE; SMIRNOV; DISTRIBUTIONS; NETWORK; SAMPLE;
D O I
10.1080/01621459.2016.1147356
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Two-sample tests for multivariate data and especially for non-Euclidean data are not well explored. This article presents a novel test statistic based on a similarity graph constructed on the pooled observations from the two samples. It can be applied to multivariate data and non-Euclidean data as long as a dissimilarity measure on the sample space can be defined, which can usually be provided by domain experts. Existing tests based on a similarity graph lack power either for location or for scale alternatives. The new test uses a common pattern that was overlooked previously, and works for both types of alternatives. The test exhibits substantial power gains in simulation studies. Its asymptotic permutation null distribution is derived and shown to work well under finite samples, facilitating its application to large datasets. The new test is illustrated on two applications: The assessment of covariate balance in a matched observational study, and the comparison of network data under different conditions.
引用
收藏
页码:397 / 409
页数:13
相关论文
共 50 条
  • [21] A Kernel Two-Sample Test for Functional Data
    Wynne, George
    Duncan, Andrew B.
    Journal of Machine Learning Research, 2022, 23 : 1 - 51
  • [22] A Sequential Non-Parametric Multivariate Two-Sample Test
    Lheritier, Alix
    Cazals, Frederic
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2018, 64 (05) : 3361 - 3370
  • [23] A TWO-SAMPLE TEST
    Moses, Lincoln E.
    PSYCHOMETRIKA, 1952, 17 (03) : 239 - 247
  • [24] The new and improved two-sample t test
    Keselman, HJ
    Othman, AR
    Wilcox, RR
    Fradette, K
    PSYCHOLOGICAL SCIENCE, 2004, 15 (01) : 47 - 51
  • [25] Two-sample test based on classification probability
    Cai, Haiyan
    Goggin, Bryan
    Jiang, Qingtang
    STATISTICAL ANALYSIS AND DATA MINING, 2020, 13 (01) : 5 - 13
  • [26] Two-sample test for comparing ambiguity in fuzzy data
    Grzegorzewski, Przemyslaw
    2022 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2022,
  • [27] Two-sample test for compositional data with ball divergence
    Zhu, Jin
    Lv, Kunsheng
    Zhang, Aijun
    Pan, Wenliang
    Wang, Xueqin
    STATISTICS AND ITS INTERFACE, 2019, 12 (02) : 275 - 282
  • [28] Data driven rank test for two-sample problem
    Janic-Wróblewska, A
    Ledwina, T
    SCANDINAVIAN JOURNAL OF STATISTICS, 2000, 27 (02) : 281 - 297
  • [29] The new robust two-sample test for randomly right-censored data
    Philonenko, Petr
    Postovalov, Sergey
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2019, 89 (08) : 1357 - 1375
  • [30] Projection-based two-sample inference for sparsely observed multivariate functional data
    Koner, Salil
    Luo, Sheng
    BIOSTATISTICS, 2024,