A novel autoencoder approach to feature extraction with linear separability for high-dimensional data

被引:0
|
作者
Zheng J. [1 ]
Qu H. [1 ,2 ]
Li Z. [1 ]
Li L. [1 ]
Tang X. [2 ]
Guo F. [2 ]
机构
[1] College of Computer Science and Technology, Chongqing University of Post and Telecommunications, Chongqing
[2] College of Automation, Chongqing University of Posts and Telecommunications, Chongqing
基金
中国国家自然科学基金;
关键词
Autoencoder; Distance metric; Feature extraction;
D O I
10.7717/PEERJ-CS.1061
中图分类号
学科分类号
摘要
Feature extraction often needs to rely on sufficient information of the input data, however, the distribution of the data upon a high-dimensional space is too sparse to provide sufficient information for feature extraction. Furthermore, high dimensionality of the data also creates trouble for the searching of those features scattered in subspaces. As such, it is a tricky task for feature extraction from the data upon a high-dimensional space. To address this issue, this article proposes a novel autoencoder method using Mahalanobis distance metric of rescaling transformation. The key idea of the method is that by implementing Mahalanobis distance metric of rescaling transformation, the difference between the reconstructed distribution and the original distribution can be reduced, so as to improve the ability of feature extraction to the autoencoder. Results show that the proposed approach wins the state-of-the-art methods in terms of both the accuracy of feature extraction and the linear separabilities of the extracted features. We indicate that distance metric-based methods are more suitable for extracting those features with linear separabilities from high-dimensional data than feature selection-based methods. In a high-dimensional space, evaluating feature similarity is relatively easier than evaluating feature importance, so that distance metric methods by evaluating feature similarity gain advantages over feature selection methods by assessing feature importance for feature extraction, while evaluating feature importance is more computationally efficient than evaluating feature similarity. © 2022 Zheng et al.
引用
收藏
相关论文
共 50 条
  • [31] Feature selection for high-dimensional imbalanced data
    Yin, Liuzhi
    Ge, Yong
    Xiao, Keli
    Wang, Xuehua
    Quan, Xiaojun
    NEUROCOMPUTING, 2013, 105 : 3 - 11
  • [32] AutoEncoder based High-Dimensional Data Fault Detection System
    Fan, Jicong
    Wang, Wei
    Zhang, Haijun
    2017 IEEE 15TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2017, : 1001 - 1006
  • [33] A filter feature selection for high-dimensional data
    Janane, Fatima Zahra
    Ouaderhman, Tayeb
    Chamlal, Hasna
    JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
  • [34] Feature selection for high-dimensional temporal data
    Tsagris, Michail
    Lagani, Vincenzo
    Tsamardinos, Ioannis
    BMC BIOINFORMATICS, 2018, 19
  • [35] Feature selection for high-dimensional temporal data
    Michail Tsagris
    Vincenzo Lagani
    Ioannis Tsamardinos
    BMC Bioinformatics, 19
  • [36] Feature Selection with High-Dimensional Imbalanced Data
    Van Hulse, Jason
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    Wald, Randall
    2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 507 - 514
  • [37] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Verleysen, Michel
    ECTA 2011/FCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON EVOLUTIONARY COMPUTATION THEORY AND APPLICATIONS AND INTERNATIONAL CONFERENCE ON FUZZY COMPUTATION THEORY AND APPLICATIONS, 2011,
  • [38] Cluster feature selection in high-dimensional linear models
    Lin, Bingqing
    Pang, Zhen
    Wang, Qihua
    RANDOM MATRICES-THEORY AND APPLICATIONS, 2018, 7 (01)
  • [39] Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection
    Poelsterl, Sebastian
    Conjeti, Sailesh
    Navab, Nassir
    Katouzian, Amin
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2016, 72 : 1 - 11
  • [40] MULTISCALE GEOMETRIC FEATURE EXTRACTION FOR HIGH-DIMENSIONAL AND NON-EUCLIDEAN DATA WITH APPLICATIONS
    Chandler, Gabriel
    Polonik, Wolfgang
    ANNALS OF STATISTICS, 2021, 49 (02): : 988 - 1010