A New Graph-Based Two-Sample Test for Multivariate and Object Data

被引:69
|
作者
Chen, Hao [1 ]
Friedman, Jerome H. [2 ]
机构
[1] Univ Calif Davis, Dept Stat, 4218 Math Sci, Davis, CA 95616 USA
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
General alternatives; Nonparametrics; Permutation null distribution; Similarity graph; COVARIATE BALANCE; SMIRNOV; DISTRIBUTIONS; NETWORK; SAMPLE;
D O I
10.1080/01621459.2016.1147356
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Two-sample tests for multivariate data and especially for non-Euclidean data are not well explored. This article presents a novel test statistic based on a similarity graph constructed on the pooled observations from the two samples. It can be applied to multivariate data and non-Euclidean data as long as a dissimilarity measure on the sample space can be defined, which can usually be provided by domain experts. Existing tests based on a similarity graph lack power either for location or for scale alternatives. The new test uses a common pattern that was overlooked previously, and works for both types of alternatives. The test exhibits substantial power gains in simulation studies. Its asymptotic permutation null distribution is derived and shown to work well under finite samples, facilitating its application to large datasets. The new test is illustrated on two applications: The assessment of covariate balance in a matched observational study, and the comparison of network data under different conditions.
引用
收藏
页码:397 / 409
页数:13
相关论文
共 50 条
  • [41] Two-sample test based on maximum variance discrepancy
    Makigusa, N.
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024, 53 (15) : 5421 - 5438
  • [42] Unsupervised Classifier Selection Based on Two-Sample Test
    Aho, Timo
    Elomaa, Tapio
    Kujala, Jussi
    DISCOVERY SCIENCE, PROCEEDINGS, 2008, 5255 : 28 - 39
  • [43] A nonparametric two-sample test applicable to high dimensional data
    Biswas, Munmun
    Ghosh, Anil K.
    JOURNAL OF MULTIVARIATE ANALYSIS, 2014, 123 : 160 - 171
  • [44] A two-sample Bayesian t-test for microarray data
    Fox, RJ
    Dimmic, MW
    BMC BIOINFORMATICS, 2006, 7 (1)
  • [45] THE USE OF TWO-SAMPLE t-TEST IN THE REAL DATA
    Al-Kassab, Mowafaq Muhammed
    Majeed, Aveen Hameed
    ADVANCES AND APPLICATIONS IN STATISTICS, 2022, 81 : 13 - 22
  • [46] A two-sample Bayesian t-test for microarray data
    Richard J Fox
    Matthew W Dimmic
    BMC Bioinformatics, 7
  • [47] Two-sample multivariate similarity permutation comparison
    Mielke, Paul W., Jr.
    Berry, Kenneth J.
    PSYCHOLOGICAL REPORTS, 2007, 100 (01) : 257 - 262
  • [48] Distribution-free two-sample homogeneity test for circular data based on distance
    Ali, Ahmed Jebur
    Abushilah, Samira Faisal
    INTERNATIONAL JOURNAL OF NONLINEAR ANALYSIS AND APPLICATIONS, 2022, 13 (01): : 2703 - 2711
  • [49] Graph-Based Data Association in Multiple Object Tracking: A Survey
    Touska, Despoina
    Gkountakos, Konstantinos
    Tsikrika, Theodora
    Ioannidis, Konstantinos
    Vrochidis, Stefanos
    Kompatsiaris, Ioannis
    MULTIMEDIA MODELING, MMM 2023, PT II, 2023, 13834 : 386 - 398
  • [50] New two-sample test utilizing interpoint distance discrepancy
    Xu, Dong
    STAT, 2024, 13 (03):