A framework for paired-sample hypothesis testing for high-dimensional data

被引:0
|
作者
Bargiotas, Ioannis [1 ]
Kalogeratos, Argyris [1 ]
Vayatis, Nicolas [1 ]
机构
[1] ENS Paris Saclay, Ctr Borelli, Gif Sur Yvette, France
关键词
statistical hypothesis testing; paired-sample testing; p-value correction; Hodges-Lehmann estimator; pseudomedian; multidimensional data;
D O I
10.1109/ICTAI59109.2023.00011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The standard paired-sample testing approach in the multidimensional setting applies multiple univariate tests on the individual features, followed by p-value adjustments. Such an approach suffers when the data carry numerous features. A number of studies have shown that classification accuracy can be seen as a proxy for two-sample testing. However, neither theoretical foundations nor practical recipes have been proposed so far on how this strategy could be extended to multidimensional paired-sample testing. In this work, we put forward the idea that scoring functions can be produced by the decision rules defined by the perpendicular bisecting hyperplanes of the line segments connecting each pair of instances. Then, the optimal scoring function can be obtained by the pseudomedian of those rules, which we estimate by extending naturally the Hodges-Lehmann estimator. We accordingly propose a framework of a two-step testing procedure. First, we estimate the bisecting hyperplanes for each pair of instances and an aggregated rule derived through the Hodges-Lehmann estimator. The paired samples are scored by this aggregated rule to produce a unidimensional representation. Second, we perform a Wilcoxon signed-rank test on the obtained representation. Our experiments indicate that our approach has substantial performance gains in testing accuracy compared to the traditional multivariate and multiple testing, while at the same time estimates each feature's contribution to the final result.
引用
收藏
页码:16 / 21
页数:6
相关论文
共 50 条
  • [1] Two sample test for high-dimensional partially paired data
    Lee, Seokho
    Lim, Johan
    Sohn, Insuk
    Jung, Sin-Ho
    Park, Cheol-Keun
    [J]. JOURNAL OF APPLIED STATISTICS, 2015, 42 (09) : 1946 - 1961
  • [2] Nearly Optimal Sample Size in Hypothesis Testing for High-Dimensional Regression
    Javanmard, Adel
    Montanari, Andrea
    [J]. 2013 51ST ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2013, : 1427 - 1434
  • [3] Testing equality between two diagnostic procedures in paired-sample ordinal data
    Lui, KJ
    Zhou, XH
    Lin, CD
    [J]. BIOMETRICAL JOURNAL, 2004, 46 (06) : 642 - 652
  • [4] Testing equivalence between two laboratories or two methods using paired-sample analysis and interval hypothesis testing
    Feng, Shixia
    Liang, Qiwei
    Kinser, Robin D.
    Newland, Kirk
    Guilbaud, Rudolf
    [J]. ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2006, 385 (05) : 975 - 981
  • [5] Testing equivalence between two laboratories or two methods using paired-sample analysis and interval hypothesis testing
    Shixia Feng
    Qiwei Liang
    Robin D. Kinser
    Kirk Newland
    Rudolf Guilbaud
    [J]. Analytical and Bioanalytical Chemistry, 2006, 385 : 975 - 981
  • [6] SAMPLE-SIZE FOR TESTING DIFFERENCES IN PROPORTIONS FOR THE PAIRED-SAMPLE DESIGN
    CONNOR, RJ
    [J]. BIOMETRICS, 1987, 43 (01) : 207 - 211
  • [7] Fault classification for high-dimensional data streams: A directional diagnostic framework based on multiple hypothesis testing
    Xiang, Dongdong
    Li, Wendong
    Tsung, Fugee
    Pu, Xiaolong
    Kang, Yicheng
    [J]. NAVAL RESEARCH LOGISTICS, 2021, 68 (07) : 973 - 987
  • [8] Hypothesis testing for high-dimensional covariance matrices
    Li, Weiming
    Qin, Yingli
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2014, 128 : 108 - 119
  • [9] A test on linear hypothesis of k-sample means in high-dimensional data
    Cao, Mingxiang
    Sun, Peng
    He, Daojiang
    Wang, Rui
    Xu, Xingzhong
    [J]. STATISTICS AND ITS INTERFACE, 2020, 13 (01) : 27 - 36
  • [10] HYPOTHESIS TESTING IN HIGH-DIMENSIONAL INSTRUMENTAL VARIABLES REGRESSION WITH AN APPLICATION TO GENOMICS DATA
    Lu, Jiarui
    Li, Hongzhe
    [J]. STATISTICA SINICA, 2022, 32 : 613 - 633