Unsupervised Feature Selection for Histogram-Valued Symbolic Data Using Hierarchical Conceptual Clustering

被引:3
|
作者
Ichino, Manabu [1 ]
Umbleja, Kadri [2 ]
Yaguchi, Hiroyuki [1 ]
机构
[1] Tokyo Denki Univ, Sch Sci & Engn, Hatoyama, Saitama 3500394, Japan
[2] Tallinn Univ Technol, Dept Comp Syst, Ehitajate Tee 5, EE-19086 Tallinn, Estonia
来源
STATS | 2021年 / 4卷 / 02期
关键词
unsupervised feature selection; histogram-valued data; compactness; hierarchical conceptual clustering; multi-role measure; visualization;
D O I
10.3390/stats4020024
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
This paper presents an unsupervised feature selection method for multi-dimensional histogram-valued data. We define a multi-role measure, called the compactness, based on the concept size of given objects and/or clusters described using a fixed number of equal probability bin-rectangles. In each step of clustering, we agglomerate objects and/or clusters so as to minimize the compactness for the generated cluster. This means that the compactness plays the role of a similarity measure between objects and/or clusters to be merged. Minimizing the compactness is equivalent to maximizing the dis-similarity of the generated cluster, i.e., concept, against the whole concept in each step. In this sense, the compactness plays the role of cluster quality. We also show that the average compactness of each feature with respect to objects and/or clusters in several clustering steps is useful as a feature effectiveness criterion. Features having small average compactness are mutually covariate and are able to detect a geometrically thin structure embedded in the given multi-dimensional histogram-valued data. We obtain thorough understandings of the given data via visualization using dendrograms and scatter diagrams with respect to the selected informative features. We illustrate the effectiveness of the proposed method by using an artificial data set and real histogram-valued data sets.
引用
收藏
页码:359 / 384
页数:26
相关论文
共 50 条
  • [1] Convex clustering analysis for histogram-valued data
    Park, Cheolwoo
    Choi, Hosik
    Delcher, Chris
    Wang, Yanning
    Yoon, Young Joo
    BIOMETRICS, 2019, 75 (02) : 603 - 612
  • [2] Double monothetic clustering for histogram-valued data
    Kim, Jaejik
    Billard, L.
    COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2018, 25 (03) : 263 - 274
  • [3] The Lookup Table Regression Model for Histogram-Valued Symbolic Data
    Ichino, Manabu
    STATS, 2022, 5 (04): : 1271 - 1293
  • [4] Copulas and Histogram-Valued Data
    Jin, Honghe
    Billard, Lynne
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2023, 32 (02) : 642 - 657
  • [5] Histogram-valued data on value at risk measures: a symbolic approach for risk attribution
    Toque, Carole
    Terraza, Virginie
    APPLIED ECONOMICS LETTERS, 2014, 21 (17) : 1243 - 1251
  • [6] Principal component analysis for histogram-valued data
    J. Le-Rademacher
    L. Billard
    Advances in Data Analysis and Classification, 2017, 11 : 327 - 351
  • [7] Classification of histogram-valued data with support histogram machines
    Kang, Ilsuk
    Park, Cheolwoo
    Yoon, Young Joo
    Park, Changyi
    Kwon, Soon-Sun
    Choi, Hosik
    JOURNAL OF APPLIED STATISTICS, 2023, 50 (03) : 675 - 690
  • [8] Principal component analysis for histogram-valued data
    Le-Rademacher, J.
    Billard, L.
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2017, 11 (02) : 327 - 351
  • [9] Unsupervised Hierarchical Feature Selection on Networked Data
    Zhang, Yuzhe
    Chen, Chen
    Luo, Minnan
    Li, Jundong
    Yan, Caixia
    Zheng, Qinghua
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT III, 2020, 12114 : 137 - 153
  • [10] A polythetic clustering process and cluster validity indexes for histogram-valued objects
    Kim, Jaejik
    Billard, L.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (07) : 2250 - 2262