Classification for high-dimension low-sample size data

被引:10
|
作者
Shen, Liran [1 ]
Er, Meng Joo [1 ]
Yin, Qingbo [2 ]
机构
[1] Dalian Maritime Univ, Coll Marine Elect Engn, Dalian 116023, Peoples R China
[2] Dalian Maritime Univ, Coll Informat Sci & Technol, Dalian 116023, Peoples R China
基金
中国国家自然科学基金;
关键词
Binary linear classifier; Quadratic programming; Data piling; Covariance matrix; FACE RECOGNITION; DISCRIMINATION; CLASSIFIERS; ENSEMBLE; MODELS; FOREST; ROBUST; TUMOR;
D O I
10.1016/j.patcog.2022.108828
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-dimension and low-sample-size (HDLSS) data sets have posed great challenges to many machine learning methods. To deal with practical HDLSS problems, development of new classification techniques is highly desired. After the cause of the over-fitting phenomenon is identified, a new classification criterion for HDLSS data sets, termed tolerance similarity, is proposed to emphasize maximization of within-class variance on the premise of class separability. Leveraging on this criterion, a novel linear binary classifier, termed No-separated Data Maximum Dispersion classifier (NPDMD), is designed. The main idea of the NPDMD is to spread samples of two classes in a large interval in the respective positive or negative space along the projecting direction when the distance between the projection means for two classes is large enough. The salient features of the proposed NPDMD are: (1) The NPDMD operates well on HDLSS data sets; (2) The NPDMD solves the objective function in the entire feature space to avoid the data-piling phenomenon. (3) The NPDMD leverages on the low-rank property of the covariance matrix for HDLSS data sets to accelerate the computation speed. (4) The NPDMD is suitable for different real-word applications. (5) The NPDMD can be implemented readily using Quadratic Programming. Not only theoretical properties of the NPDMD have been derived, but also a series of evaluations have been conducted on one simulated and six real-world benchmark data sets, including face classification and mRNA classification. Experimental results and comprehensive studies demonstrate the superiority of the NPDMD in terms of correct classification rate, mean within-group correct classification rate and the area under the ROC curve. (C) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Deep Neural Networks for High Dimension, Low Sample Size Data
    Liu, Bo
    Wei, Ying
    Zhang, Yu
    Yang, Qiang
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2287 - 2293
  • [42] Classification for high-dimension small-sample data sets based on Kullback-Leibler information measure
    Guo, P
    Lyu, MR
    IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, : 1187 - 1193
  • [43] OPIT: A Simple but Effective Method for Sparse Subspace Tracking in High-Dimension and Low-Sample-Size Context
    Le, Thanh Trung
    Abed-Meraim, Karim
    Trung, Nguyen Linh
    Hafiane, Adel
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 521 - 534
  • [44] Bias-corrected support vector machine with Gaussian kernel in high-dimension, low-sample-size settings
    Nakayama, Yugo
    Yata, Kazuyoshi
    Aoshima, Makoto
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2020, 72 (05) : 1257 - 1286
  • [45] Bias-corrected support vector machine with Gaussian kernel in high-dimension, low-sample-size settings
    Yugo Nakayama
    Kazuyoshi Yata
    Makoto Aoshima
    Annals of the Institute of Statistical Mathematics, 2020, 72 : 1257 - 1286
  • [46] Population structure-learned classifier for high-dimension low-sample-size class-imbalanced problem
    Shen, Liran
    Er, Meng Joo
    Liu, Weijiang
    Fan, Yunsheng
    Yin, Qingbo
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 111
  • [47] Graph convolutional network-based feature selection for high-dimensional and low-sample size data
    Chen, Can
    Weiss, Scott T.
    Liu, Yang-Yu
    BIOINFORMATICS, 2023, 39 (04)
  • [48] Comparison of binary discrimination methods for high dimension low sample size data
    Bolivar-Cime, A.
    Marron, J. S.
    JOURNAL OF MULTIVARIATE ANALYSIS, 2013, 115 : 108 - 121
  • [49] On Some Fast And Robust Classifiers For High Dimension, Low Sample Size Data
    Roy, Sarbojit
    Choudhury, Jyotishka Ray
    Dutta, Subhajit
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [50] Feature Selection Solution with High Dimensionality and Low-Sample Size for Land Cover Classification in Object-Based Image Analysis
    Huang, Yaohuan
    Zhao, Chuanpeng
    Yang, Haijun
    Song, Xiaoyang
    Chen, Jie
    Li, Zhonghua
    REMOTE SENSING, 2017, 9 (09):