Variational Autoencoder-Based Dimensionality Reduction for High-Dimensional Small-Sample Data Classification

被引:24
|
作者
Mahmud, Mohammad Sultan [1 ]
Huang, Joshua Zhexue
Fu, Xianghua
机构
[1] Shenzhen Univ, Natl Engn Lab Big Data Syst Comp Technol, Shenzhen 518060, Peoples R China
基金
中国国家自然科学基金;
关键词
High-dimensional and small-sample size dataset; variational autoencoder; classification; computational biology; deep learning; NONNEGATIVE MATRIX FACTORIZATION; PRINCIPAL COMPONENT ANALYSIS; GENE;
D O I
10.1142/S1469026820500029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification problems in which the number of features (dimensions) is unduly higher than the number of samples (observations) is an essential research and application area in a variety of domains, especially in computational biology. It is also known as a high-dimensional small-sample-size (HDSSS) problem. Various dimensionality reduction methods have been developed, but they are not potent with the small-sample-sized high-dimensional datasets and suffer from overfitting and high-variance gradients. To overcome the pitfalls of sample size and dimensionality, this study employed variational autoencoder (VAE), which is a dynamic framework for unsupervised learning in recent years. The objective of this study is to investigate a reliable classification model for high-dimensional and small-sample-sized datasets with minimal error. Moreover, it evaluated the strength of different architectures of VAE on the HDSSS datasets. In the experiment, six genomic microarray datasets from Kent Ridge Biomedical Dataset Repository were selected, and several choices of dimensions (features) were applied for data preprocessing. Also, to evaluate the classification accuracy and to find a stable and suitable classifier, nine state-of-the-art classifiers that have been successful for classification tasks in high-dimensional data settings were selected. The experimental results demonstrate that the VAE can provide superior performance compared to traditional methods such as PCA, fastlCA, FA, NMF, and LDA in terms of accuracy and AUROC.
引用
下载
收藏
页数:19
相关论文
共 50 条
  • [31] An Optimized Dimensionality Reduction Model for High-dimensional Data Based on Restricted Boltzmann Machines
    Zhang, Ke
    Liu, Jianhuan
    Chai, Yi
    Qian, Kun
    2015 27TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2015, : 2963 - 2968
  • [32] Autoencoder-Based Fusion Classification of Hyperspectral and LiDAR Data
    Wang Yibo
    Dai Song
    Song Dongmei
    Cao Guofa
    Ren Jie
    LASER & OPTOELECTRONICS PROGRESS, 2024, 61 (12)
  • [33] High-Dimensional Expensive Optimization by Classification-based Multiobjective Evolutionary Algorithm with Dimensionality Reduction
    Horaguchi, Yuma
    Nakata, Masaya
    2023 62ND ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS, SICE, 2023, : 1535 - 1542
  • [34] Optimized Mahalanobis-Taguchi System for High-Dimensional Small Sample Data Classification
    Xiao, Xinping
    Fu, Dian
    Shi, Yu
    Wen, Jianghui
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2020, 2020
  • [35] Multisource Latent Feature Selective Ensemble Modeling Approach for Small-Sample High-Dimensional Process Data in Applications
    Tang, Jian
    Zhang, Jian
    Yu, Gang
    Zhang, Wenping
    Yu, Wen
    IEEE ACCESS, 2020, 8 : 148475 - 148488
  • [36] Forward Stepwise Deep Autoencoder-Based Monotone Nonlinear Dimensionality Reduction Methods
    Fong, Youyi
    Xu, Jun
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2021, 30 (03) : 519 - 529
  • [37] Factor-analytic Inverse Regression for High-dimension, Small-sample Dimensionality Reduction
    Jha, Aditi
    Morais, Michael J.
    Pillow, Jonathan W.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [38] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
    Guanglei Meng
    Biao Wang
    Yanming Wu
    Mingzhe Zhou
    Tiankuo Meng
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 3705 - 3718
  • [39] Dependence maps, a dimensionality reduction with dependence distance for high-dimensional data
    Lee, Kichun
    Gray, Alexander
    Kim, Heeyoung
    DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 26 (03) : 512 - 532
  • [40] A hybrid dimensionality reduction method for outlier detection in high-dimensional data
    Meng, Guanglei
    Wang, Biao
    Wu, Yanming
    Zhou, Mingzhe
    Meng, Tiankuo
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (11) : 3705 - 3718