Network-based dimensionality reduction of high-dimensional, low-sample-size datasets

被引:6
|
作者
Kosztyan, Zsolt T. [1 ,2 ,3 ]
Kurbucz, Marcell T. [1 ,4 ]
Katona, Attila I. [1 ,5 ]
机构
[1] Univ Pannonia, Dept Quantitat Methods, Egyet St 10, H-8200 Veszprem, Hungary
[2] Inst Adv Stud iASK, Chernel St 14, H-9730 Koszeg, Hungary
[3] Eotvos Lorand Res Network ELKH, MTA PE Budapest Ranking Res Grp, Piarista St 4, H-1052 Budapest, Hungary
[4] Wigner Res Ctr Phys, Dept Computat Sci, Konkoly Thege Mikl St 29-33, H-1121 Budapest, Hungary
[5] Corvinus Univ Budapest, Dept Stat, Fovam Sq 8, H-1093 Budapest, Hungary
关键词
Nonparametric methods; Dimensionality reduction; Community detection; Communality analysis; PRINCIPAL COMPONENT ANALYSIS; ENSEMBLE; CLASSIFICATION; CONSISTENCY; PCA;
D O I
10.1016/j.knosys.2022.109180
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of data science, there are a variety of datasets that suffer from the high-dimensional, low-sample-size (HDLSS) problem; however, only a few dimensionality reduction methods exist that are applicable to address this type of problem, and there is no nonparametric solution to date. The purpose of this work is to develop a novel network-based (nonparametric) dimensionality reduction analysis (NDA) method, that can be effectively applied to HDLSS data. First, with the NDA method, the correlation graph of variables is specified. With a modularity-based community detection method, the set of modules is specified. Then, the linear combination of variables weighted by their eigenvector centralities (EVCs), defined as LVs, is determined. In the optional phase of variable selection, variables with low EVCs and low communality are ignored. Then, the set of LVs and the set of indicators belonging to the LVs are specified using the NDA method. NDA is applied to publicly available databases and compared with principal factoring with community analysis (PFA) methods. The results show that NDA can be effectively applied to HDLSS datasets as it outperforms the existing methods in terms of interpretability. In addition, the application of NDA is easier, since there is no need to specify the number of latent variables due to its nonparametric nature.(c) 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Graph convolutional network-based feature selection for high-dimensional and low-sample size data
    Chen, Can
    Weiss, Scott T.
    Liu, Yang-Yu
    [J]. BIOINFORMATICS, 2023, 39 (04)
  • [2] Efficient Dimensionality Reduction for High-Dimensional Network Estimation
    Celik, Safiye
    Logsdon, Benjamin A.
    Lee, Su-In
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1953 - 1961
  • [3] An Efficient Dimensionality Reduction Approach for Small-sample Size and High-dimensional Data Modeling
    Qiu, Xintao
    Fu, Dongmei
    Fu, Zhenduo
    [J]. JOURNAL OF COMPUTERS, 2014, 9 (03) : 576 - 580
  • [4] Multiple-instance ensemble for construction of deep heterogeneous committees for high-dimensional low-sample-size data
    Zhou, Qinghua
    Wang, Shuihua
    Zhu, Hengde
    Zhang, Xin
    Zhang, Yudong
    [J]. NEURAL NETWORKS, 2023, 167 : 380 - 399
  • [5] Effective Data Dimensionality Reduction Workflow for High-Dimensional Gene Expression Datasets
    Das, Utsha
    Srizon, Azmain Yakin
    Hasan, Md Al Mehedi
    Rahman, Julia
    Ben Islam, Md Khaled
    [J]. 2020 IEEE REGION 10 SYMPOSIUM (TENSYMP) - TECHNOLOGY FOR IMPACTFUL SUSTAINABLE DEVELOPMENT, 2020, : 182 - 185
  • [6] Dimensionality Reduction Algorithms on High Dimensional Datasets
    Syarif, Iwan
    [J]. EMITTER-INTERNATIONAL JOURNAL OF ENGINEERING TECHNOLOGY, 2014, 2 (02) : 28 - 38
  • [7] A Network-Based Model for High-Dimensional Information Filtering
    Nanas, Nikolaos
    Vavalis, Manolis
    De Roeck, Anne
    [J]. SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 202 - 209
  • [8] Variational Autoencoder-Based Dimensionality Reduction for High-Dimensional Small-Sample Data Classification
    Mahmud, Mohammad Sultan
    Huang, Joshua Zhexue
    Fu, Xianghua
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2020, 19 (01)
  • [9] Network-based Clustering and Embedding for High-Dimensional Data Visualization
    Zhang, Hengyuan
    Chen, Xiaowu
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN AND COMPUTER GRAPHICS (CAD/GRAPHICS), 2013, : 290 - 297
  • [10] Network-Based Interface for the Exploration of High-Dimensional Data Spaces
    Zhang, Zhiyuan
    McDonnell, Kevin T.
    Mueller, Klaus
    [J]. IEEE PACIFIC VISUALIZATION SYMPOSIUM 2012, 2012, : 17 - 24