Context-Based Geodesic Dissimilarity Measure for Clustering Categorical Data

被引:1
|
作者
Lee, Changki [1 ]
Jung, Uk [2 ]
机构
[1] Samsung Elect Co Ltd, Suwon 16677, South Korea
[2] Dongguk Univ Seoul, Sch Business, Dept Management, Seoul 04620, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 18期
关键词
geodesic distance; categorical data; mutual k-nearest neighbor graph; association-based dissimilarity; Gower distance; DIVERGENCE; ALGORITHM;
D O I
10.3390/app11188416
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Measuring the dissimilarity between two observations is the basis of many data mining and machine learning algorithms, and its effectiveness has a significant impact on learning outcomes. The dissimilarity or distance computation has been a manageable problem for continuous data because many numerical operations can be successfully applied. However, unlike continuous data, defining a dissimilarity between pairs of observations with categorical variables is not straightforward. This study proposes a new method to measure the dissimilarity between two categorical observations, called a context-based geodesic dissimilarity measure, for the categorical data clustering problem. The proposed method considers the relationships between categorical variables and discovers the implicit topological structures in categorical data. In other words, it can effectively reflect the nonlinear patterns of arbitrarily shaped categorical data clusters. Our experimental results confirm that the proposed measure that considers both nonlinear data patterns and relationships among the categorical variables yields better clustering performance than other distance measures.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] A New Context-Based Clustering Framework for Categorical Data
    Thanh-Phu Nguyen
    Duy-Tai Dinh
    Van-Nam Huynh
    [J]. PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2018, 11012 : 697 - 709
  • [2] Context-Based Distance Learning for Categorical Data Clustering
    Ienco, Dino
    Pensa, Ruggero G.
    Meo, Rosa
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS VIII, PROCEEDINGS, 2009, 5772 : 83 - 94
  • [3] From Context to Distance: Learning Dissimilarity for Categorical Data Clustering
    Ienco, Dino
    Pensa, Ruggero G.
    Meo, Rosa
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2012, 6 (01)
  • [4] An effective dissimilarity measure for clustering of high-dimensional categorical data
    Lee, Jeonghoon
    Lee, Yoon-Joon
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 38 (03) : 743 - 757
  • [5] An effective dissimilarity measure for clustering of high-dimensional categorical data
    Jeonghoon Lee
    Yoon-Joon Lee
    [J]. Knowledge and Information Systems, 2014, 38 : 743 - 757
  • [6] Learning-Based Dissimilarity for Clustering Categorical Data
    Rivera Rios, Edgar Jacob
    Angel Medina-Perez, Miguel
    Lazo-Cortes, Manuel S.
    Monroy, Raul
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (08):
  • [7] An association-based dissimilarity measure for categorical data
    Le, SQ
    Ho, TB
    [J]. PATTERN RECOGNITION LETTERS, 2005, 26 (16) : 2549 - 2557
  • [8] Graph Enhanced Fuzzy Clustering for Categorical Data Using a Bayesian Dissimilarity Measure
    Zhang, Chuanbin
    Chen, Long
    Zhao, Yin-Ping
    Wang, Yingxu
    Chen, C. L. Philip
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2023, 31 (03) : 810 - 824
  • [9] An efficient entropy based dissimilarity measure to cluster categorical data
    Kar, Amit Kumar
    Mishra, Amaresh Chandra
    Mohanty, Sraban Kumar
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 119
  • [10] A Comparative Analysis of Dissimilarity Measures for Clustering Categorical Data
    Xavierr-Junior, Joao C.
    Canuto, Anne M. P.
    Almeida, Noriedson D.
    Goncalves, Luiz M. G.
    [J]. 2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,