Measuring the quality of projections of high-dimensional labeled data

被引:2
|
作者
Benato, Barbara C. [1 ]
Falcao, Alexandre X. [1 ]
Telea, Alexandru C. [2 ]
机构
[1] Univ Estadual Campinas, Inst Comp, Ave Albert Einstein 1251, BR-13083852 Campinas, Brazil
[2] Univ Utrecht, Fac Sci, Dept Informat & Comp Sci, Utrecht, Netherlands
来源
COMPUTERS & GRAPHICS-UK | 2023年 / 116卷
基金
巴西圣保罗研究基金会;
关键词
Quality of projections; Labeled data; Pseudo labeling; REDUCTION; ALGORITHMS;
D O I
10.1016/j.cag.2023.08.023
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Dimensionality reduction techniques, also called projections, are one of the main tools for visualizing high-dimensional data. To compare such techniques, several quality metrics have been proposed. However, such metrics may not capture the visual separation among groups/classes of samples in a projection, i.e., having groups of similar (same label) points far from other (distinct label) groups of points. For this, we propose a pseudo-labeling mechanism to assess visual separation using the performance of a semi-supervised optimum-path forest classifier (OPFSemi), measured by Cohen's Kappa. We argue that lower label propagation errors by OPFSemi in projections are related to higher data/visual separation. OPFSemi explores local and global information of data distribution when computing optimum connectivity between samples in a projection for label propagation. It is parameter-free, fast to compute, easy to implement, and generically handles any high-dimensional quantitative labeled dataset and projection technique. We compare our approach with four commonly used scalar metrics in the literature for 18 datasets and 39 projection techniques. Our results consistently show that our proposed metric consistently scores values in line with the perceived visual separation, surpassing existing projection-quality metrics in this respect. (c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页码:287 / 297
页数:11
相关论文
共 50 条
  • [11] ON ALMOST LINEARITY OF LOW-DIMENSIONAL PROJECTIONS FROM HIGH-DIMENSIONAL DATA
    HALL, P
    LI, KC
    ANNALS OF STATISTICS, 1993, 21 (02): : 867 - 889
  • [12] High-dimensional labeled data analysis with topology representing graphs
    Aupetit, M
    Catz, T
    NEUROCOMPUTING, 2005, 63 : 139 - 169
  • [13] Interpreting High-Dimensional Projections With Capacity
    Zhang, Yang
    Liu, Jisheng
    Lai, Chufan
    Zhou, Yuan
    Chen, Siming
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (09) : 6038 - 6055
  • [14] Cantor sets with high-dimensional projections
    Frolkina, Olga
    TOPOLOGY AND ITS APPLICATIONS, 2020, 275
  • [15] Interactive tools for explaining multidimensional projections for high-dimensional tabular data
    Thijssen, Julian
    Tian, Zonglin
    Telea, Alexandru
    COMPUTERS & GRAPHICS-UK, 2024, 122
  • [16] Pollux: Interactive Cluster-First Projections of High-Dimensional Data
    Wenskovitch, John
    North, Chris
    2019 IEEE VISUALIZATION IN DATA SCIENCE (VDS), 2019, : 38 - 47
  • [17] Visualnostics: Visual Guidance Pictograms for Analyzing Projections of High-dimensional Data
    Lehmann, Dirk J.
    Kemmler, Fritz
    Zhyhalava, Tatsiana
    Kirschke, Marco
    Theisel, Holger
    COMPUTER GRAPHICS FORUM, 2015, 34 (03) : 291 - 300
  • [18] Learning Binary Codes for High-Dimensional Data Using Bilinear Projections
    Gong, Yunchao
    Kumar, Sanjiv
    Rowley, Henry A.
    Lazebnik, Svetlana
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 484 - 491
  • [19] Using projections to visually cluster high-dimensional
    Hinneburg, A
    Keim, D
    Wawryniuk, M
    COMPUTING IN SCIENCE & ENGINEERING, 2003, 5 (02) : 14 - 25
  • [20] Random projections for high-dimensional optimization problems
    Vu, Ky
    4OR-A QUARTERLY JOURNAL OF OPERATIONS RESEARCH, 2017, 15 (03): : 327 - 328