Classification of sparse high-dimensional vectors

被引:23
|
作者
Ingster, Yuri I. [2 ]
Pouet, Christophe [1 ]
Tsybakov, Alexandre B. [3 ,4 ]
机构
[1] Univ Provence, LATP, F-13453 Marseille 13, France
[2] St Petersburg State Electrotech Univ, St Petersburg 197376, Russia
[3] Univ Paris 06, LPMA, F-75252 Paris 05, France
[4] CREST, Stat Lab, F-92240 Malakoff, France
基金
英国工程与自然科学研究理事会;
关键词
Bayes risk; classification boundary; high-dimensional data; optimal classifier; sparse vectors; ACUTE LYMPHOBLASTIC-LEUKEMIA; HIGHER CRITICISM; MIXTURES;
D O I
10.1098/rsta.2009.0156
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We study the problem of classification of d-dimensional vectors into two classes (one of which is 'pure noise') based on a training sample of size m. The main specific feature is that the dimension d can be very large. We suppose that the difference between the distribution of the population and that of the noise is only in a shift, which is a sparse vector. For Gaussian noise, fixed sample size m, and dimension d that tends to infinity, we obtain the sharp classification boundary, i.e. the necessary and sufficient conditions for the possibility of successful classification. We propose classifiers attaining this boundary. We also give extensions of the result to the case where the sample size m depends on d and satisfies the condition (log m)/log d -> gamma, 0 <= gamma < 1, and to the case of non-Gaussian noise satisfying the Cramer condition.
引用
收藏
页码:4427 / 4448
页数:22
相关论文
共 50 条
  • [1] High-Dimensional Computing with Sparse Vectors
    Laiho, Mika
    Poikonen, Jussi H.
    Kanerva, Pentti
    Lehtonen, Eero
    2015 IEEE BIOMEDICAL CIRCUITS AND SYSTEMS CONFERENCE (BIOCAS), 2015, : 515 - 518
  • [2] Classification with High-Dimensional Sparse Samples
    Huang, Dayu
    Meyn, Sean
    2012 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY PROCEEDINGS (ISIT), 2012,
  • [3] Empirical Bayes estimators for high-dimensional sparse vectors
    Srinath, K. Pavan
    Venkataramanan, Ramji
    INFORMATION AND INFERENCE-A JOURNAL OF THE IMA, 2020, 9 (01) : 195 - 234
  • [4] High-Dimensional Classification by Sparse Logistic Regression
    Abramovich, Felix
    Grinshtein, Vadim
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2019, 65 (05) : 3068 - 3079
  • [5] Statistical Sparse Independence Rule for High-dimensional Classification
    Wang, Liping
    Ji, Changtai
    Xie, Shanggao
    Zhang, Qi
    2016 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE WORKSHOPS (WIW 2016), 2016, : 50 - 53
  • [6] On the classification consistency of high-dimensional sparse neural network
    Yang, Kaixu
    Maiti, Taps
    2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 173 - 182
  • [7] Indexing very high-dimensional sparse and quasi-sparse vectors for similarity searches
    Wang C.
    Wang X.S.
    The VLDB Journal, 2001, 9 (4) : 344 - 361
  • [8] Indexing very high-dimensional sparse and quasi-sparse vectors for similarity searches
    Wang, CZ
    Wang, XS
    VLDB JOURNAL, 2001, 9 (04): : 344 - 361
  • [9] Spectral clustering of high-dimensional data exploiting sparse representation vectors
    Wu, Sen
    Feng, Xiaodong
    Zhou, Wenjun
    NEUROCOMPUTING, 2014, 135 : 229 - 239
  • [10] Sparse representation approaches for the classification of high-dimensional biological data
    Li, Yifeng
    Ngom, Alioune
    BMC SYSTEMS BIOLOGY, 2013, 7