Classification for high-dimension low-sample size data

被引：0

作者：

Shen, Liran ^{[1
]}

Er, Meng Joo ^{[1
]}

Yin, Qingbo ^{[2
]}

机构：

[1] Dalian Maritime Univ, Coll Marine Elect Engn, Dalian 116023, Peoples R China

[2] Dalian Maritime Univ, Coll Informat Sci & Technol, Dalian 116023, Peoples R China

来源：

PATTERN RECOGNITION | 2022年 / 130卷

基金：

中国国家自然科学基金;

关键词：

Binary linear classifier; Quadratic programming; Data piling; Covariance matrix;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

High-dimension and low-sample-size (HDLSS) data sets have posed great challenges to many machine learning methods. To deal with practical HDLSS problems, development of new classification techniques is highly desired. After the cause of the over-fitting phenomenon is identified, a new classification criterion for HDLSS data sets, termed tolerance similarity, is proposed to emphasize maximization of within-class variance on the premise of class separability. Leveraging on this criterion, a novel linear binary classifier, termed No-separated Data Maximum Dispersion classifier (NPDMD), is designed. The main idea of the NPDMD is to spread samples of two classes in a large interval in the respective positive or negative space along the projecting direction when the distance between the projection means for two classes is large enough. The salient features of the proposed NPDMD are: (1) The NPDMD operates well on HDLSS data sets; (2) The NPDMD solves the objective function in the entire feature space to avoid the data piling phenomenon. (3) The NPDMD leverages on the low-rank property of the covariance matrix for HDLSS data sets to accelerate the computation speed. (4) The NPDMD is suitable for different real-word applications. (5) The NPDMD can be implemented readily using Quadratic Programming. Not only theoretical properties of the NPDMD have been derived, but also a series of evaluations have been conducted on one simulated and six real-world benchmark data sets, including face classification and mRNA classification. Experimental results and comprehensive studies demonstrate the superiority of the NPDMD in terms of correct classification rate, mean within-group correct classification rate and the area under the ROC curve.(c) 2022 Elsevier Ltd. All rights reserved.

引用

页数：18

共 50 条

[1] Classification for high-dimension low-sample size data
Shen, Liran
Er, Meng Joo
Yin, Qingbo
PATTERN RECOGNITION, 2022, 130
[2] Statistical Significance of Clustering for High-Dimension, Low-Sample Size Data
Liu, Yufeng
Hayes, David Neil
Nobel, Andrew
Marron, J. S.
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (483) : 1281 - 1293
[3] Some considerations of classification for high dimension low-sample size data
Zhang, Lingsong
Lin, Xihong
STATISTICAL METHODS IN MEDICAL RESEARCH, 2013, 22 (05) : 537 - 550
[4] Structural Classification based Correlation and its Application to Principal Component Analysis for High-Dimension Low-Sample Size Data
Sato-Ilic, Mika
2012 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2012,
[5] Unsupervised classification of high-dimension and low-sample data with variational autoencoder based dimensionality reduction
Mahmud, Mohammad Sultan
Fu, Xianghua
2019 IEEE 4TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2019), 2019, : 498 - 503
[6] Experimental Analysis of Feature Selection Stability for High-Dimension and Low-Sample Size Gene Expression Classification Task
Dernoncourt, David
Hanczar, Blaise
Zucker, Jean-Daniel
IEEE 12TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS & BIOENGINEERING, 2012, : 350 - 355
[7] Random forest kernel for high-dimension low sample size classification
Lucca Portes Cavalheiro
Simon Bernard
Jean Paul Barddal
Laurent Heutte
Statistics and Computing, 2024, 34
[8] Random forest kernel for high-dimension low sample size classification
Cavalheiro, Lucca Portes
Bernard, Simon
Barddal, Jean Paul
Heutte, Laurent
STATISTICS AND COMPUTING, 2024, 34 (01)
[9] High-dimension, low-sample size perspectives in constrained statistical inference: The SARSCoV RNA genome in illustration
Sen, Pranab K.
Tsai, Ming-Tien
Jou, Yuh-Shan
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2007, 102 (478) : 686 - 694
[10] Intrinsic Dimensionality Estimation of High-Dimension, Low Sample Size Data with D-Asymptotics
Yata, Kazuyoshi
Aoshima, Makoto
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2010, 39 (8-9) : 1511 - 1521

← 1 2 3 4 5 →