Unsupervised classification of high-dimension and low-sample data with variational autoencoder based dimensionality reduction

被引:3
|
作者
Mahmud, Mohammad Sultan [1 ]
Fu, Xianghua [1 ,2 ]
机构
[1] Shenzhen Univ, Coll Comp & Software Engn, Big Data Inst, Shenzhen 518060, Peoples R China
[2] Shenzhen Technol Univ, Fac Arts & Sci, Shenzhen 518118, Peoples R China
关键词
HDLSS dataset; dimensionality reduction; variational autoencoder; unsupervised classification; ALGORITHMS; PCA;
D O I
10.1109/icarm.2019.8834333
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In data mining research and development, one of the defining challenges is to perform classification or clustering tasks for relatively limited-samples with high-dimensions data, also known as high-dimensional limited-sample size (HDLSS) problem. Due to the limited-sample-size, there is a lack of enough training data to train classification models. Also, the `curse of dimensionality' aspect is often a restriction on the effectiveness of many methods for solving HDLSS problem. Classification model with limited-sample dataset lead to overfitting and cannot achieve a satisfactory result. Thus, the unsupervised method is a better choice to solve such problems. Due to the emergence of deep learning, their plenty of applications and promising outcome, it is required an extensive analysis of the deep learning technique on HDLSS dataset. This paper aims at evaluating the performance of variational autoencoder (VAE) based dimensionality reduction and unsupervised classification on the HDESS dataset. The performance of VAE is compared with two existing techniques namely PCA and NMF on fourteen datasets in term of three evaluation metrics namely purity, Rand index, and NMI. The experimental result shows the superiority of VAE over the traditional methods on the HDLSS dataset.
引用
收藏
页码:498 / 503
页数:6
相关论文
共 50 条
  • [1] Classification for high-dimension low-sample size data
    Shen, Liran
    Er, Meng Joo
    Yin, Qingbo
    [J]. PATTERN RECOGNITION, 2022, 130
  • [2] Classification for high-dimension low-sample size data
    Shen, Liran
    Er, Meng Joo
    Yin, Qingbo
    [J]. PATTERN RECOGNITION, 2022, 130
  • [3] Statistical Significance of Clustering for High-Dimension, Low-Sample Size Data
    Liu, Yufeng
    Hayes, David Neil
    Nobel, Andrew
    Marron, J. S.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (483) : 1281 - 1293
  • [4] Structural Classification based Correlation and its Application to Principal Component Analysis for High-Dimension Low-Sample Size Data
    Sato-Ilic, Mika
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2012,
  • [5] Robust Dimensionality Reduction for High-Dimension Data
    Xu, Huan
    Caramanis, Constantine
    Mannor, Shie
    [J]. 2008 46TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING, VOLS 1-3, 2008, : 1291 - +
  • [6] Variational Autoencoder-Based Dimensionality Reduction for High-Dimensional Small-Sample Data Classification
    Mahmud, Mohammad Sultan
    Huang, Joshua Zhexue
    Fu, Xianghua
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2020, 19 (01)
  • [7] Some considerations of classification for high dimension low-sample size data
    Zhang, Lingsong
    Lin, Xihong
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2013, 22 (05) : 537 - 550
  • [8] Experimental Analysis of Feature Selection Stability for High-Dimension and Low-Sample Size Gene Expression Classification Task
    Dernoncourt, David
    Hanczar, Blaise
    Zucker, Jean-Daniel
    [J]. IEEE 12TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS & BIOENGINEERING, 2012, : 350 - 355
  • [9] Intrinsic Dimensionality Estimation of High-Dimension, Low Sample Size Data with D-Asymptotics
    Yata, Kazuyoshi
    Aoshima, Makoto
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2010, 39 (8-9) : 1511 - 1521
  • [10] High-dimension, low-sample size perspectives in constrained statistical inference: The SARSCoV RNA genome in illustration
    Sen, Pranab K.
    Tsai, Ming-Tien
    Jou, Yuh-Shan
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2007, 102 (478) : 686 - 694