Unsupervised classification of high-dimension and low-sample data with variational autoencoder based dimensionality reduction

被引:3
|
作者
Mahmud, Mohammad Sultan [1 ]
Fu, Xianghua [1 ,2 ]
机构
[1] Shenzhen Univ, Coll Comp & Software Engn, Big Data Inst, Shenzhen 518060, Peoples R China
[2] Shenzhen Technol Univ, Fac Arts & Sci, Shenzhen 518118, Peoples R China
关键词
HDLSS dataset; dimensionality reduction; variational autoencoder; unsupervised classification; ALGORITHMS; PCA;
D O I
10.1109/icarm.2019.8834333
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
In data mining research and development, one of the defining challenges is to perform classification or clustering tasks for relatively limited-samples with high-dimensions data, also known as high-dimensional limited-sample size (HDLSS) problem. Due to the limited-sample-size, there is a lack of enough training data to train classification models. Also, the `curse of dimensionality' aspect is often a restriction on the effectiveness of many methods for solving HDLSS problem. Classification model with limited-sample dataset lead to overfitting and cannot achieve a satisfactory result. Thus, the unsupervised method is a better choice to solve such problems. Due to the emergence of deep learning, their plenty of applications and promising outcome, it is required an extensive analysis of the deep learning technique on HDLSS dataset. This paper aims at evaluating the performance of variational autoencoder (VAE) based dimensionality reduction and unsupervised classification on the HDESS dataset. The performance of VAE is compared with two existing techniques namely PCA and NMF on fourteen datasets in term of three evaluation metrics namely purity, Rand index, and NMI. The experimental result shows the superiority of VAE over the traditional methods on the HDLSS dataset.
引用
收藏
页码:498 / 503
页数:6
相关论文
共 50 条
  • [21] Classification for Time Series Data. An Unsupervised Approach Based on Reduction of Dimensionality
    M. Isabel Landaluce-Calvo
    Juan I. Modroño-Herrán
    [J]. Journal of Classification, 2020, 37 : 380 - 398
  • [22] Classification for Time Series Data. An Unsupervised Approach Based on Reduction of Dimensionality
    Isabel Landaluce-Calvo, M.
    Modrono-Herran, Juan, I
    [J]. JOURNAL OF CLASSIFICATION, 2020, 37 (02) : 380 - 398
  • [23] Data Augmentation in High Dimensional Low Sample Size Setting Using a Geometry-Based Variational Autoencoder
    Chadebec, Clement
    Thibeau-Sutre, Elina
    Burgos, Ninon
    Allassonniere, Stephanie
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 2879 - 2896
  • [24] Performance of feature-selection methods in the classification of high-dimension data
    Hua, Jianping
    Tembe, Waibhav D.
    Dougherty, Edward R.
    [J]. PATTERN RECOGNITION, 2009, 42 (03) : 409 - 424
  • [25] ON SIMULTANEOUS CALIBRATION OF TWO-SAMPLE t-TESTS FOR HIGH-DIMENSION LOW-SAMPLE-SIZE DATA
    Zhang, Chunming
    Jia, Shengji
    Wu, Yongfeng
    [J]. STATISTICA SINICA, 2021, 31 (03) : 1189 - 1214
  • [26] Second Order Expansions for High-Dimension Low-Sample-Size Data Statistics in Random Setting
    Christoph, Gerd
    Ulyanov, Vladimir V.
    [J]. MATHEMATICS, 2020, 8 (07)
  • [27] Data maximum dispersion classifier in projection space for high-dimension low-sample-size problems
    Shen, Liran
    Yin, Qingbo
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 193
  • [28] Effective PCA for high-dimension, low-sample-size data with singular value decomposition of cross data matrix
    Yata, Kazuyoshi
    Aoshima, Makoto
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2010, 101 (09) : 2060 - 2077
  • [29] Personalized PageRank Based Feature Selection for High-dimension Data
    Zhu, Zhibo
    Peng, Qinke
    Guan, Xinyu
    [J]. PROCEEDINGS OF 2019 11TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2019), 2019, : 197 - 202
  • [30] A two-sample test for high-dimension, low-sample-size data under the strongly spiked eigenvalue model
    Ishii, Aki
    [J]. HIROSHIMA MATHEMATICAL JOURNAL, 2017, 47 (03) : 273 - 288