Adaptive threshold-based classification of sparse high-dimensional data

被引:0
|
作者
Pavlenko, Tatjana [1 ]
Stepanova, Natalia [2 ]
Thompson, Lee [2 ]
机构
[1] Uppsala Univ, Dept Stat, Box 513, S-75120 Uppsala, Sweden
[2] Carleton Univ, Sch Math & Stat, 1125 Colonel By Dr, Ottawa, ON K1S 5B6, Canada
来源
ELECTRONIC JOURNAL OF STATISTICS | 2022年 / 16卷 / 01期
基金
加拿大自然科学与工程研究理事会;
关键词
High-dimensional data; sparse vectors; adaptive threshold-based classification; asymptotically optimal classifier; HIGHER CRITICISM; SELECTION;
D O I
10.1214/22-EJS1998
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We revisit the problem of designing an efficient binary classifier in a challenging high-dimensional framework. The model under study assumes some local dependence structure among feature variables represented by a block-diagonal covariance matrix with a growing number of blocks of an arbitrary, but fixed size. The blocks correspond to non-overlapping independent groups of strongly correlated features. To assess the relevance of a particular block in predicting the response, we introduce a measure of "signal strength" pertaining to each feature block. This measure is then used to specify a sparse model of our interest. We further propose a threshold-based feature selector which operates as a screen-and-clean scheme integrated into a linear classifier: the data is subject to screening and hard threshold cleaning to filter out the blocks that contain no signals. Asymptotic properties of the proposed classifiers are studied when the sample size n depends on the number of feature blocks b, and the sample size goes to infinity with b at a slower rate than b. The new classifiers, which are fully adaptive to unknown parameters of the model, are shown to perform asymptotically optimally in a large part of the classification region. The numerical study confirms good analytical properties of the new classifiers that compare favorably to the existing threshold-based procedure used in a similar context.
引用
收藏
页码:1952 / 1996
页数:45
相关论文
共 50 条
  • [1] Threshold-based feature selection techniques for high-dimensional bioinformatics data
    Van Hulse J.
    Khoshgoftaar T.M.
    Napolitano A.
    Wald R.
    [J]. Network Modeling Analysis in Health Informatics and Bioinformatics, 2012, 1 (1-2) : 47 - 61
  • [2] CLASSIFICATION OF HIGH-DIMENSIONAL DATA USING THE SPARSE MATRIX TRANSFORM
    Bachega, Leonardo R. |
    Bouman, Charles A.
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 265 - 268
  • [3] Sparse representation approaches for the classification of high-dimensional biological data
    Li, Yifeng
    Ngom, Alioune
    [J]. BMC SYSTEMS BIOLOGY, 2013, 7
  • [4] Classification of sparse high-dimensional vectors
    Ingster, Yuri I.
    Pouet, Christophe
    Tsybakov, Alexandre B.
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2009, 367 (1906): : 4427 - 4448
  • [5] Classification with High-Dimensional Sparse Samples
    Huang, Dayu
    Meyn, Sean
    [J]. 2012 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS (ISIT), 2012,
  • [6] High-dimensional Data Stream Classification via Sparse Online Learning
    Wang, Dayong
    Wu, Pengcheng
    Zhao, Peilin
    Wu, Yue
    Miao, Chunyan
    Hoi, Steven C. H.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 1007 - 1012
  • [7] On sparse linear discriminant analysis algorithm for high-dimensional data classification
    Ng, Michael K.
    Liao, Li-Zhi
    Zhang, Leihong
    [J]. NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2011, 18 (02) : 223 - 235
  • [8] On the anonymization of sparse high-dimensional data
    Ghinita, Gabriel
    Tao, Yufei
    Kalnis, Panos
    [J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 715 - +
  • [9] Interpolation of sparse high-dimensional data
    Lux, Thomas C. H.
    Watson, Layne T.
    Chang, Tyler H.
    Hong, Yili
    Cameron, Kirk
    [J]. NUMERICAL ALGORITHMS, 2021, 88 (01) : 281 - 313
  • [10] Interpolation of sparse high-dimensional data
    Thomas C. H. Lux
    Layne T. Watson
    Tyler H. Chang
    Yili Hong
    Kirk Cameron
    [J]. Numerical Algorithms, 2021, 88 : 281 - 313