Classification for high-dimension low-sample size data

被引:0
|
作者
Shen, Liran [1 ]
Er, Meng Joo [1 ]
Yin, Qingbo [2 ]
机构
[1] Dalian Maritime Univ, Coll Marine Elect Engn, Dalian 116023, Peoples R China
[2] Dalian Maritime Univ, Coll Informat Sci & Technol, Dalian 116023, Peoples R China
基金
中国国家自然科学基金;
关键词
Binary linear classifier; Quadratic programming; Data piling; Covariance matrix;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-dimension and low-sample-size (HDLSS) data sets have posed great challenges to many machine learning methods. To deal with practical HDLSS problems, development of new classification techniques is highly desired. After the cause of the over-fitting phenomenon is identified, a new classification criterion for HDLSS data sets, termed tolerance similarity, is proposed to emphasize maximization of within-class variance on the premise of class separability. Leveraging on this criterion, a novel linear binary classifier, termed No-separated Data Maximum Dispersion classifier (NPDMD), is designed. The main idea of the NPDMD is to spread samples of two classes in a large interval in the respective positive or negative space along the projecting direction when the distance between the projection means for two classes is large enough. The salient features of the proposed NPDMD are: (1) The NPDMD operates well on HDLSS data sets; (2) The NPDMD solves the objective function in the entire feature space to avoid the data piling phenomenon. (3) The NPDMD leverages on the low-rank property of the covariance matrix for HDLSS data sets to accelerate the computation speed. (4) The NPDMD is suitable for different real-word applications. (5) The NPDMD can be implemented readily using Quadratic Programming. Not only theoretical properties of the NPDMD have been derived, but also a series of evaluations have been conducted on one simulated and six real-world benchmark data sets, including face classification and mRNA classification. Experimental results and comprehensive studies demonstrate the superiority of the NPDMD in terms of correct classification rate, mean within-group correct classification rate and the area under the ROC curve.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Classification for high-dimension low-sample size data
    Shen, Liran
    Er, Meng Joo
    Yin, Qingbo
    [J]. PATTERN RECOGNITION, 2022, 130
  • [2] Statistical Significance of Clustering for High-Dimension, Low-Sample Size Data
    Liu, Yufeng
    Hayes, David Neil
    Nobel, Andrew
    Marron, J. S.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (483) : 1281 - 1293
  • [3] Some considerations of classification for high dimension low-sample size data
    Zhang, Lingsong
    Lin, Xihong
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2013, 22 (05) : 537 - 550
  • [4] Structural Classification based Correlation and its Application to Principal Component Analysis for High-Dimension Low-Sample Size Data
    Sato-Ilic, Mika
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2012,
  • [5] Unsupervised classification of high-dimension and low-sample data with variational autoencoder based dimensionality reduction
    Mahmud, Mohammad Sultan
    Fu, Xianghua
    [J]. 2019 IEEE 4TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2019), 2019, : 498 - 503
  • [6] Experimental Analysis of Feature Selection Stability for High-Dimension and Low-Sample Size Gene Expression Classification Task
    Dernoncourt, David
    Hanczar, Blaise
    Zucker, Jean-Daniel
    [J]. IEEE 12TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS & BIOENGINEERING, 2012, : 350 - 355
  • [7] Random forest kernel for high-dimension low sample size classification
    Cavalheiro, Lucca Portes
    Bernard, Simon
    Barddal, Jean Paul
    Heutte, Laurent
    [J]. STATISTICS AND COMPUTING, 2024, 34 (01)
  • [8] Random forest kernel for high-dimension low sample size classification
    Lucca Portes Cavalheiro
    Simon Bernard
    Jean Paul Barddal
    Laurent Heutte
    [J]. Statistics and Computing, 2024, 34
  • [9] High-dimension, low-sample size perspectives in constrained statistical inference: The SARSCoV RNA genome in illustration
    Sen, Pranab K.
    Tsai, Ming-Tien
    Jou, Yuh-Shan
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2007, 102 (478) : 686 - 694
  • [10] Intrinsic Dimensionality Estimation of High-Dimension, Low Sample Size Data with D-Asymptotics
    Yata, Kazuyoshi
    Aoshima, Makoto
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2010, 39 (8-9) : 1511 - 1521