Network-based dimensionality reduction of high-dimensional, low-sample-size datasets

被引:6
|
作者
Kosztyan, Zsolt T. [1 ,2 ,3 ]
Kurbucz, Marcell T. [1 ,4 ]
Katona, Attila I. [1 ,5 ]
机构
[1] Univ Pannonia, Dept Quantitat Methods, Egyet St 10, H-8200 Veszprem, Hungary
[2] Inst Adv Stud iASK, Chernel St 14, H-9730 Koszeg, Hungary
[3] Eotvos Lorand Res Network ELKH, MTA PE Budapest Ranking Res Grp, Piarista St 4, H-1052 Budapest, Hungary
[4] Wigner Res Ctr Phys, Dept Computat Sci, Konkoly Thege Mikl St 29-33, H-1121 Budapest, Hungary
[5] Corvinus Univ Budapest, Dept Stat, Fovam Sq 8, H-1093 Budapest, Hungary
关键词
Nonparametric methods; Dimensionality reduction; Community detection; Communality analysis; PRINCIPAL COMPONENT ANALYSIS; ENSEMBLE; CLASSIFICATION; CONSISTENCY; PCA;
D O I
10.1016/j.knosys.2022.109180
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of data science, there are a variety of datasets that suffer from the high-dimensional, low-sample-size (HDLSS) problem; however, only a few dimensionality reduction methods exist that are applicable to address this type of problem, and there is no nonparametric solution to date. The purpose of this work is to develop a novel network-based (nonparametric) dimensionality reduction analysis (NDA) method, that can be effectively applied to HDLSS data. First, with the NDA method, the correlation graph of variables is specified. With a modularity-based community detection method, the set of modules is specified. Then, the linear combination of variables weighted by their eigenvector centralities (EVCs), defined as LVs, is determined. In the optional phase of variable selection, variables with low EVCs and low communality are ignored. Then, the set of LVs and the set of indicators belonging to the LVs are specified using the NDA method. NDA is applied to publicly available databases and compared with principal factoring with community analysis (PFA) methods. The results show that NDA can be effectively applied to HDLSS datasets as it outperforms the existing methods in terms of interpretability. In addition, the application of NDA is easier, since there is no need to specify the number of latent variables due to its nonparametric nature.(c) 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页数:14
相关论文
共 50 条
  • [21] An adaptive and efficient dimensionality reduction algorithm for high-dimensional indexing
    Jin, H
    Ooi, BC
    Shen, HT
    Yu, C
    Zhou, AY
    [J]. 19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, : 87 - 98
  • [22] An adaptive and dynamic dimensionality reduction method for high-dimensional indexing
    Heng Tao Shen
    Xiaofang Zhou
    Aoying Zhou
    [J]. The VLDB Journal, 2007, 16 : 219 - 234
  • [23] Dimensionality reduction for density ratio estimation in high-dimensional spaces
    Sugiyama, Masashi
    Kawanabe, Motoaki
    Chui, Pui Ling
    [J]. NEURAL NETWORKS, 2010, 23 (01) : 44 - 59
  • [24] An adaptive and dynamic dimensionality reduction method for high-dimensional indexing
    Shen, Heng Tao
    Zhou, Xiaofang
    Zhou, Aoying
    [J]. VLDB JOURNAL, 2007, 16 (02): : 219 - 234
  • [25] Incorporating Texture Information into Dimensionality Reduction for High-Dimensional Images
    Vieth, A.
    Vilanova, A.
    Lelieveldt, B.
    Eisemann, E.
    Hollt, T.
    [J]. 2022 IEEE 15TH PACIFIC VISUALIZATION SYMPOSIUM (PACIFICVIS 2022), 2022, : 11 - 20
  • [26] Efficient indexing of high-dimensional data through dimensionality reduction
    Goh, CH
    Lim, A
    Ooi, BC
    Tan, KL
    [J]. DATA & KNOWLEDGE ENGINEERING, 2000, 32 (02) : 115 - 130
  • [27] A dimensionality reduction method for efficient search of high-dimensional databases
    Aghbari, Z
    Kaneko, K
    Makinouchi, A
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (06): : 1032 - 1041
  • [28] Sensitivity analysis approaches to high-dimensional screening problems at low sample size
    Becker, W. E.
    Tarantola, S.
    Deman, G.
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (11) : 2089 - 2110
  • [29] The analysis on dimensionality reduction mathematical model based on feedback constraint for High-dimensional information
    Peng, Wu
    [J]. ADVANCES IN MECHATRONICS, AUTOMATION AND APPLIED INFORMATION TECHNOLOGIES, PTS 1 AND 2, 2014, 846-847 : 1056 - 1059
  • [30] Registration of high-dimensional remote sensing data based on a new dimensionality reduction rule
    Xu, Min
    Chen, Hao
    Varshney, Pramod K.
    [J]. 2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 581 - 584