Measuring the quality of projections of high-dimensional labeled data

被引:1
|
作者
Benato, Barbara C. [1 ]
Falcao, Alexandre X. [1 ]
Telea, Alexandru C. [2 ]
机构
[1] Univ Estadual Campinas, Inst Comp, Ave Albert Einstein 1251, BR-13083852 Campinas, Brazil
[2] Univ Utrecht, Fac Sci, Dept Informat & Comp Sci, Utrecht, Netherlands
来源
COMPUTERS & GRAPHICS-UK | 2023年 / 116卷
基金
巴西圣保罗研究基金会;
关键词
Quality of projections; Labeled data; Pseudo labeling; REDUCTION; ALGORITHMS;
D O I
10.1016/j.cag.2023.08.023
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Dimensionality reduction techniques, also called projections, are one of the main tools for visualizing high-dimensional data. To compare such techniques, several quality metrics have been proposed. However, such metrics may not capture the visual separation among groups/classes of samples in a projection, i.e., having groups of similar (same label) points far from other (distinct label) groups of points. For this, we propose a pseudo-labeling mechanism to assess visual separation using the performance of a semi-supervised optimum-path forest classifier (OPFSemi), measured by Cohen's Kappa. We argue that lower label propagation errors by OPFSemi in projections are related to higher data/visual separation. OPFSemi explores local and global information of data distribution when computing optimum connectivity between samples in a projection for label propagation. It is parameter-free, fast to compute, easy to implement, and generically handles any high-dimensional quantitative labeled dataset and projection technique. We compare our approach with four commonly used scalar metrics in the literature for 18 datasets and 39 projection techniques. Our results consistently show that our proposed metric consistently scores values in line with the perceived visual separation, surpassing existing projection-quality metrics in this respect. (c) 2023 Elsevier Ltd. All rights reserved.
引用
下载
收藏
页码:287 / 297
页数:11
相关论文
共 50 条
  • [31] Local projections for high-dimensional outlier detection
    Ortner, Thomas
    Filzmoser, Peter
    Rohm, Maia
    Brodinova, Sarka
    Breiteneder, Christian
    METRON-INTERNATIONAL JOURNAL OF STATISTICS, 2021, 79 (02): : 189 - 206
  • [32] On-line monitoring data quality of high-dimensional data streams
    Qi, Dequan
    Li, Zhonghua
    Wang, Zhaojun
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2016, 86 (11) : 2204 - 2216
  • [33] Statistical quality control analysis of high-dimensional omics data
    Kim, Yongkang
    Kim, Gyu-Tae
    Kwon, Min-Seok
    Park, Taesung
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2017, 18 (03) : 210 - 222
  • [34] Towards High-dimensional Data Analysis in Air Quality Research
    Engel, D.
    Hummel, M.
    Hoepel, F.
    Bein, K.
    Wexler, A.
    Garth, C.
    Hamann, B.
    Hagen, H.
    COMPUTER GRAPHICS FORUM, 2013, 32 (03) : 101 - 110
  • [35] Quality Metrics in High-Dimensional Data Visualization: An Overview and Systematization
    Bertini, Enrico
    Tatu, Andrada
    Keim, Daniel
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2011, 17 (12) : 2203 - 2212
  • [36] LEARNING FROM HIGH-DIMENSIONAL NOISY DATA VIA PROJECTIONS ONTO MULTI-DIMENSIONAL ELLIPSOIDS
    Gong, Liuling
    Schonfeld, Dan
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 1970 - 1973
  • [37] Ascending and Descending Order of Random Projections: Comparative Analysis of High-Dimensional Data Clustering
    Pasunuri, Raghunadh
    Venkaiah, Vadlamudi China
    Dhariyal, Bhaskar
    HARMONY SEARCH AND NATURE INSPIRED OPTIMIZATION ALGORITHMS, 2019, 741 : 133 - 142
  • [38] Uncovering interpretable relationships in high-dimensional scientific data through function preserving projections
    Shusen, Liu
    Anirudh, Rushil
    Thiagarajan, Jayaraman J.
    Bremer, Peer-Timo
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2020, 1 (04):
  • [39] tourrGui: A gWidgets GUI for the Tour to Explore High-Dimensional Data Using Low-Dimensional Projections
    Huang, Bei
    Cook, Dianne
    Wickham, Hadley
    JOURNAL OF STATISTICAL SOFTWARE, 2012, 49 (06): : 1 - 12
  • [40] High-dimensional outlier detection using random projections
    Navarro-Esteban, P.
    Cuesta-Albertos, J. A.
    TEST, 2021, 30 (04) : 908 - 934