Variational Autoencoder-Based Dimensionality Reduction for High-Dimensional Small-Sample Data Classification

被引:24
|
作者
Mahmud, Mohammad Sultan [1 ]
Huang, Joshua Zhexue
Fu, Xianghua
机构
[1] Shenzhen Univ, Natl Engn Lab Big Data Syst Comp Technol, Shenzhen 518060, Peoples R China
基金
中国国家自然科学基金;
关键词
High-dimensional and small-sample size dataset; variational autoencoder; classification; computational biology; deep learning; NONNEGATIVE MATRIX FACTORIZATION; PRINCIPAL COMPONENT ANALYSIS; GENE;
D O I
10.1142/S1469026820500029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification problems in which the number of features (dimensions) is unduly higher than the number of samples (observations) is an essential research and application area in a variety of domains, especially in computational biology. It is also known as a high-dimensional small-sample-size (HDSSS) problem. Various dimensionality reduction methods have been developed, but they are not potent with the small-sample-sized high-dimensional datasets and suffer from overfitting and high-variance gradients. To overcome the pitfalls of sample size and dimensionality, this study employed variational autoencoder (VAE), which is a dynamic framework for unsupervised learning in recent years. The objective of this study is to investigate a reliable classification model for high-dimensional and small-sample-sized datasets with minimal error. Moreover, it evaluated the strength of different architectures of VAE on the HDSSS datasets. In the experiment, six genomic microarray datasets from Kent Ridge Biomedical Dataset Repository were selected, and several choices of dimensions (features) were applied for data preprocessing. Also, to evaluate the classification accuracy and to find a stable and suitable classifier, nine state-of-the-art classifiers that have been successful for classification tasks in high-dimensional data settings were selected. The experimental results demonstrate that the VAE can provide superior performance compared to traditional methods such as PCA, fastlCA, FA, NMF, and LDA in terms of accuracy and AUROC.
引用
下载
收藏
页数:19
相关论文
共 50 条
  • [1] Variational autoencoder-based outlier detection for high-dimensional data
    Li, Yongmou
    Wang, Yijie
    Ma, Xingkong
    INTELLIGENT DATA ANALYSIS, 2019, 23 (05) : 991 - 1002
  • [2] An Efficient Dimensionality Reduction Approach for Small-sample Size and High-dimensional Data Modeling
    Qiu, Xintao
    Fu, Dongmei
    Fu, Zhenduo
    JOURNAL OF COMPUTERS, 2014, 9 (03) : 576 - 580
  • [3] SVAD: Stacked Variational Autoencoder Deep Neural Network-Based Dimensionality Reduction and Classification of Small Sample Size and High Dimensional Data
    Neha Srivastava
    Devendra K. Tayal
    SN Computer Science, 5 (7)
  • [4] Unsupervised classification of high-dimension and low-sample data with variational autoencoder based dimensionality reduction
    Mahmud, Mohammad Sultan
    Fu, Xianghua
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2019), 2019, : 498 - 503
  • [5] UNSUPERVISED ADAPTATION FOR HIGH-DIMENSIONAL WITH LIMITED-SAMPLE DATA CLASSIFICATION USING VARIATIONAL AUTOENCODER
    Mahmud, Mohammad Sultan
    Huang, Joshua Zhexue
    Fu, Xianghua
    Ruby, Rukhsana
    Wu, Kaishun
    COMPUTING AND INFORMATICS, 2021, 40 (01) : 1 - 28
  • [6] A Hybrid Feature Selection Algorithm Applied to High-dimensional Imbalanced Small-sample Data Classification
    Feng, Fang
    Lv, Qingquan
    Wang, Mingsong
    Yang, Xuhui
    Zhou, Qingguo
    Zhou, Rui
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 41 - 46
  • [7] An Autoencoder-Based Hybrid Detection Model for Intrusion Detection With Small-Sample Problem
    Wei, Nan
    Yin, Lihua
    Tan, Jingyi
    Ruan, Chuhong
    Yin, Chuang
    Sun, Zhe
    Luo, Xi
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2024, 21 (02): : 2402 - 2412
  • [8] Online streaming feature selection for high-dimensional small-sample data
    Kuangfeng Gong
    Guohe Li
    Lingyun Guo
    Yaojin Lin
    International Journal of Machine Learning and Cybernetics, 2025, 16 (4) : 2705 - 2719
  • [9] Hybrid Dimensionality Reduction Forest With Pruning for High-Dimensional Data Classification
    Chen, Weihong
    Xu, Yuhong
    Yu, Zhiwen
    Cao, Wenming
    Chen, C. L. Philip
    Han, Guoqiang
    IEEE ACCESS, 2020, 8 : 40138 - 40150
  • [10] Analysis of traffic accident causes based on data augmentation and ensemble learning with high-dimensional small-sample data
    Zhu, Leipeng
    Zhang, Zhiqing
    Song, Dongdong
    Chen, Biao
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 237