Variational Autoencoder-Based Dimensionality Reduction for High-Dimensional Small-Sample Data Classification

被引:30
|
作者
Mahmud, Mohammad Sultan [1 ]
Huang, Joshua Zhexue
Fu, Xianghua
机构
[1] Shenzhen Univ, Natl Engn Lab Big Data Syst Comp Technol, Shenzhen 518060, Peoples R China
基金
中国国家自然科学基金;
关键词
High-dimensional and small-sample size dataset; variational autoencoder; classification; computational biology; deep learning; NONNEGATIVE MATRIX FACTORIZATION; PRINCIPAL COMPONENT ANALYSIS; GENE;
D O I
10.1142/S1469026820500029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification problems in which the number of features (dimensions) is unduly higher than the number of samples (observations) is an essential research and application area in a variety of domains, especially in computational biology. It is also known as a high-dimensional small-sample-size (HDSSS) problem. Various dimensionality reduction methods have been developed, but they are not potent with the small-sample-sized high-dimensional datasets and suffer from overfitting and high-variance gradients. To overcome the pitfalls of sample size and dimensionality, this study employed variational autoencoder (VAE), which is a dynamic framework for unsupervised learning in recent years. The objective of this study is to investigate a reliable classification model for high-dimensional and small-sample-sized datasets with minimal error. Moreover, it evaluated the strength of different architectures of VAE on the HDSSS datasets. In the experiment, six genomic microarray datasets from Kent Ridge Biomedical Dataset Repository were selected, and several choices of dimensions (features) were applied for data preprocessing. Also, to evaluate the classification accuracy and to find a stable and suitable classifier, nine state-of-the-art classifiers that have been successful for classification tasks in high-dimensional data settings were selected. The experimental results demonstrate that the VAE can provide superior performance compared to traditional methods such as PCA, fastlCA, FA, NMF, and LDA in terms of accuracy and AUROC.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Using synthetic data and dimensionality reduction in high-dimensional classification via logistic regression
    Zarei, Shaho
    Mohammadpour, Adel
    COMPUTATIONAL METHODS FOR DIFFERENTIAL EQUATIONS, 2019, 7 (04): : 626 - 634
  • [22] Autoencoder-based outlier detection for sparse, high dimensional data
    Chen, Wanghu
    Li, Huijun
    Li, Jing
    Arshad, Ali
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2735 - 2742
  • [23] Self-taught dimensionality reduction on the high-dimensional small-sized data
    Zhu, Xiaofeng
    Huang, Zi
    Yang, Yang
    Shen, Heng Tao
    Xu, Changsheng
    Luo, Jiebo
    PATTERN RECOGNITION, 2013, 46 (01) : 215 - 229
  • [24] A sparse grid based method for generative dimensionality reduction of high-dimensional data
    Bohn, Bastian
    Garcke, Jochen
    Griebel, Michael
    JOURNAL OF COMPUTATIONAL PHYSICS, 2016, 309 : 1 - 17
  • [25] Efficient indexing of high-dimensional data through dimensionality reduction
    Goh, CH
    Lim, A
    Ooi, BC
    Tan, KL
    DATA & KNOWLEDGE ENGINEERING, 2000, 32 (02) : 115 - 130
  • [26] Network-based dimensionality reduction of high-dimensional, low-sample-size datasets
    Kosztyan, Zsolt T.
    Kurbucz, Marcell T.
    Katona, Attila I.
    KNOWLEDGE-BASED SYSTEMS, 2022, 251
  • [27] Large-Dimensional Seismic Inversion Using Global Optimization With Autoencoder-Based Model Dimensionality Reduction
    Gao, Zhaoqi
    Li, Chuang
    Liu, Naihao
    Pan, Zhibin
    Gao, Jinghuai
    Xu, Zongben
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (02): : 1718 - 1732
  • [28] Neural Autoencoder-Based Structure-Preserving Model Order Reduction and Control Design for High-Dimensional Physical Systems
    Lepri, Marco
    Bacciu, Davide
    Della Santina, Cosimo
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 133 - 138
  • [29] Registration of high-dimensional remote sensing data based on a new dimensionality reduction rule
    Xu, Min
    Chen, Hao
    Varshney, Pramod K.
    2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 581 - 584
  • [30] An Efficient and Versatile Variational Method for High-Dimensional Data Classification
    Cai, Xiaohao
    Chan, Raymond H.
    Xie, Xiaoyu
    Zeng, Tieyong
    JOURNAL OF SCIENTIFIC COMPUTING, 2024, 100 (03)