Variational Autoencoder-Based Dimensionality Reduction for High-Dimensional Small-Sample Data Classification

被引:24
|
作者
Mahmud, Mohammad Sultan [1 ]
Huang, Joshua Zhexue
Fu, Xianghua
机构
[1] Shenzhen Univ, Natl Engn Lab Big Data Syst Comp Technol, Shenzhen 518060, Peoples R China
基金
中国国家自然科学基金;
关键词
High-dimensional and small-sample size dataset; variational autoencoder; classification; computational biology; deep learning; NONNEGATIVE MATRIX FACTORIZATION; PRINCIPAL COMPONENT ANALYSIS; GENE;
D O I
10.1142/S1469026820500029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification problems in which the number of features (dimensions) is unduly higher than the number of samples (observations) is an essential research and application area in a variety of domains, especially in computational biology. It is also known as a high-dimensional small-sample-size (HDSSS) problem. Various dimensionality reduction methods have been developed, but they are not potent with the small-sample-sized high-dimensional datasets and suffer from overfitting and high-variance gradients. To overcome the pitfalls of sample size and dimensionality, this study employed variational autoencoder (VAE), which is a dynamic framework for unsupervised learning in recent years. The objective of this study is to investigate a reliable classification model for high-dimensional and small-sample-sized datasets with minimal error. Moreover, it evaluated the strength of different architectures of VAE on the HDSSS datasets. In the experiment, six genomic microarray datasets from Kent Ridge Biomedical Dataset Repository were selected, and several choices of dimensions (features) were applied for data preprocessing. Also, to evaluate the classification accuracy and to find a stable and suitable classifier, nine state-of-the-art classifiers that have been successful for classification tasks in high-dimensional data settings were selected. The experimental results demonstrate that the VAE can provide superior performance compared to traditional methods such as PCA, fastlCA, FA, NMF, and LDA in terms of accuracy and AUROC.
引用
下载
收藏
页数:19
相关论文
共 50 条
  • [41] Hierarchical Clustering of High-Dimensional Data Without Global Dimensionality Reduction
    Kampman, Ilari
    Elomaa, Tapio
    FOUNDATIONS OF INTELLIGENT SYSTEMS (ISMIS 2018), 2018, 11177 : 236 - 246
  • [42] Dependence maps, a dimensionality reduction with dependence distance for high-dimensional data
    Kichun Lee
    Alexander Gray
    Heeyoung Kim
    Data Mining and Knowledge Discovery, 2013, 26 : 512 - 532
  • [43] AutoEncoder based High-Dimensional Data Fault Detection System
    Fan, Jicong
    Wang, Wei
    Zhang, Haijun
    2017 IEEE 15TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2017, : 1001 - 1006
  • [44] Contextual anomaly detection for high-dimensional data using Dirichlet process variational autoencoder
    Kim, Hyojoong
    Kim, Heeyoung
    IISE TRANSACTIONS, 2023, 55 (05) : 433 - 444
  • [45] High-Dimensional, Small-Sample Product Quality Prediction Method Based on MIC-Stacking Ensemble Learning
    Yu, Jiahao
    Pan, Rongshun
    Zhao, Yongman
    APPLIED SCIENCES-BASEL, 2022, 12 (01):
  • [46] High-dimensional Motion Segmentation by Variational Autoencoder and Gaussian Processes
    Nagano, Masatoshi
    Nakamura, Tomoaki
    Nagai, Takayuki
    Mochihashi, Daichi
    Kobayashi, Ichiro
    Takano, Wataru
    2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 105 - 111
  • [47] Interactive Visualization Methodology of High-Dimensional Data with a Color-Based Model for Dimensionality Reduction
    Pena-Unigarro, Diego F.
    Salazar-Castro, Jose A.
    Peluffo-Ordonez, Diego H.
    Rosero-Montalvo, Paul D.
    Ona-Rocha, Omar R.
    Isaza, Andres A.
    Alvarado-Perez, Juan C.
    Theron, Roberto
    2016 XXI SYMPOSIUM ON SIGNAL PROCESSING, IMAGES AND ARTIFICIAL VISION (STSIVA), 2016,
  • [48] A tied-weight autoencoder for the linear dimensionality reduction of sample data
    Sunhee Kim
    Sang-Ho Chu
    Yong-Jin Park
    Chang-Yong Lee
    Scientific Reports, 14 (1)
  • [49] Glyphboard: Visual Exploration of High-Dimensional Data Combining Glyphs with Dimensionality Reduction
    Kammer, Dietrich
    Keck, Mandy
    Gruender, Thomas
    Maasch, Alexander
    Thom, Thomas
    Kleinsteuber, Martin
    Groh, Rainer
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2020, 26 (04) : 1661 - 1671
  • [50] SeekAView: An Intelligent Dimensionality Reduction Strategy for Navigating High-Dimensional Data Spaces
    Krause, Josua
    Dasgupta, Aritra
    Fekete, Jean-Daniel
    Bertini, Enrico
    2016 IEEE 6TH SYMPOSIUM ON LARGE DATA ANALYSIS AND VISUALIZATION (LDAV), 2016, : 11 - 19