Classification for high-dimension low-sample size data

被引:0
|
作者
Shen, Liran [1 ]
Er, Meng Joo [1 ]
Yin, Qingbo [2 ]
机构
[1] Dalian Maritime Univ, Coll Marine Elect Engn, Dalian 116023, Peoples R China
[2] Dalian Maritime Univ, Coll Informat Sci & Technol, Dalian 116023, Peoples R China
基金
中国国家自然科学基金;
关键词
Binary linear classifier; Quadratic programming; Data piling; Covariance matrix;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-dimension and low-sample-size (HDLSS) data sets have posed great challenges to many machine learning methods. To deal with practical HDLSS problems, development of new classification techniques is highly desired. After the cause of the over-fitting phenomenon is identified, a new classification criterion for HDLSS data sets, termed tolerance similarity, is proposed to emphasize maximization of within-class variance on the premise of class separability. Leveraging on this criterion, a novel linear binary classifier, termed No-separated Data Maximum Dispersion classifier (NPDMD), is designed. The main idea of the NPDMD is to spread samples of two classes in a large interval in the respective positive or negative space along the projecting direction when the distance between the projection means for two classes is large enough. The salient features of the proposed NPDMD are: (1) The NPDMD operates well on HDLSS data sets; (2) The NPDMD solves the objective function in the entire feature space to avoid the data piling phenomenon. (3) The NPDMD leverages on the low-rank property of the covariance matrix for HDLSS data sets to accelerate the computation speed. (4) The NPDMD is suitable for different real-word applications. (5) The NPDMD can be implemented readily using Quadratic Programming. Not only theoretical properties of the NPDMD have been derived, but also a series of evaluations have been conducted on one simulated and six real-world benchmark data sets, including face classification and mRNA classification. Experimental results and comprehensive studies demonstrate the superiority of the NPDMD in terms of correct classification rate, mean within-group correct classification rate and the area under the ROC curve.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] Clustering by principal component analysis with Gaussian kernel in high-dimension, low-sample-size settings
    Nakayama, Yugo
    Yata, Kazuyoshi
    Aoshima, Makoto
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2021, 185
  • [32] Support vector machine and its bias correction in high-dimension, low-sample-size settings
    Nakayama, Yugo
    Yata, Kazuyoshi
    Aoshima, Makoto
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2017, 191 : 88 - 100
  • [33] Population-guided large margin classifier for high-dimension low-sample-size problems
    Yin, Qingbo
    Adeli, Ehsan
    Shen, Liran
    Shen, Dinggang
    [J]. PATTERN RECOGNITION, 2020, 97
  • [34] Correction to: Asymptotic properties of distance-weighted discrimination and its bias correction for high-dimension, low-sample-size data
    Kento Egashira
    Kazuyoshi Yata
    Makoto Aoshima
    [J]. Japanese Journal of Statistics and Data Science, 2022, 5 : 717 - 718
  • [35] Robust centroid based classification with minimum error rates for high dimension, low sample size data
    Jiang, Jiancheng
    Marron, J. S.
    Jiang, Xuejun
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2009, 139 (08) : 2571 - 2580
  • [36] Re-Stabilizing Large-Scale Network Systems Using High-Dimension Low-Sample-Size Data Analysis
    Shen, Xun
    Sasahara, Hampei
    Imura, Jun-ichi
    Oku, Makito
    Aihara, Kazuyuki
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [37] Prediction of Microcystis Occurrences and Analysis Using Machine Learning in High-Dimension, Low-Sample-Size and Imbalanced Water Quality Data
    Mori, Masaya
    Flores, Roberto Gonzalez
    Suzuki, Yoshihiro
    Nukazawa, Kei
    Hiraoka, Toru
    Nonaka, Hirofumi
    [J]. HARMFUL ALGAE, 2022, 117
  • [38] Performance of feature-selection methods in the classification of high-dimension data
    Hua, Jianping
    Tembe, Waibhav D.
    Dougherty, Edward R.
    [J]. PATTERN RECOGNITION, 2009, 42 (03) : 409 - 424
  • [39] Design of input assignment and feedback gain for re-stabilizing undirected networks with High-Dimension Low-Sample-Size data
    Yasukata, Hitoshi
    Shen, Xun
    Sasahara, Hampei
    Imura, Jun-ichi
    Oku, Makito
    Aihara, Kazuyuki
    [J]. INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2023, 33 (12) : 6734 - 6753
  • [40] Ultra-early medical treatment-oriented system identification using High-Dimension Low-Sample-Size data
    Shen, Xun
    Shimada, Naruto
    Sasahara, Hampei
    Imura, Jun-ichi
    [J]. IFAC JOURNAL OF SYSTEMS AND CONTROL, 2024, 27