k-PbC: an improved cluster center initialization for categorical data clustering

被引:0
|
作者
Duy-Tai Dinh
Van-Nam Huynh
机构
[1] Japan Advanced Institute of Science and Technology,
来源
Applied Intelligence | 2020年 / 50卷
关键词
Data mining; Distance-based clustering; Pattern mining; Maximal frequent itemsets; Cluster center initialization; Categorical data;
D O I
暂无
中图分类号
学科分类号
摘要
The performance of a partitional clustering algorithm is influenced by the initial random choice of cluster centers. Different runs of the clustering algorithm on the same data set often yield different results. This paper addresses that challenge by proposing an algorithm named k-PbC, which takes advantage of non-random initialization from the view of pattern mining to improve clustering quality. Specifically, k-PbC first performs a maximal frequent itemset mining approach to find a set of initial clusters. It then uses a kernel-based method to form cluster centers and an information-theoretic based dissimilarity measure to estimate the distance between cluster centers and data objects. An extensive experimental study was performed on various real categorical data sets to draw a comparison between k-PbC and state-of-the-art categorical clustering algorithms in terms of clustering quality. Comparative results have revealed that the proposed initialization method can enhance clustering results and k-PbC outperforms compared algorithms for both internal and external validation metrics.
引用
收藏
页码:2610 / 2632
页数:22
相关论文
共 50 条
  • [1] k-PbC: an improved cluster center initialization for categorical data clustering
    Duy-Tai Dinh
    Van-Nam Huynh
    [J]. APPLIED INTELLIGENCE, 2020, 50 (08) : 2610 - 2632
  • [2] A cluster centers initialization method for clustering categorical data
    Bai, Liang
    Liang, Jiye
    Dang, Chuangyin
    Cao, Fuyuan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (09) : 8022 - 8029
  • [3] Initialization of K-Modes Clustering for Categorical Data
    Li Tao-ying
    Chen Yan
    Jin Zhi-hong
    Li Ye
    [J]. 2013 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND ENGINEERING (ICMSE), 2013, : 107 - 112
  • [4] Cluster center initialization algorithm for K-modes clustering
    Khan, Shehroz S.
    Ahmad, Amir
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (18) : 7444 - 7456
  • [5] Cluster center initialization algorithm for K-means clustering
    Khan, SS
    Ahmad, A
    [J]. PATTERN RECOGNITION LETTERS, 2004, 25 (11) : 1293 - 1302
  • [6] A new initialization method for clustering categorical data
    Wu, Shu
    Jiang, Qingshan
    Huang, Joshua Zhexue
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 972 - +
  • [7] A new initialization method for categorical data clustering
    Cao, Fuyuan
    Liang, Jiye
    Bai, Liang
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) : 10223 - 10228
  • [8] A Support Based Initialization Algorithm for Categorical Data Clustering
    Kumar, Ajay
    Kumar, Shishir
    [J]. JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2018, 11 (02) : 53 - 67
  • [9] An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data
    Bai, Liang
    Liang, Jiye
    Dang, Chuangyin
    [J]. KNOWLEDGE-BASED SYSTEMS, 2011, 24 (06) : 785 - 795
  • [10] An Improved Initialization Center K-means Clustering Algorithm Based on Distance and Density
    Duan, Yanling
    Liu, Qun
    Xia, Shuyin
    [J]. ADVANCES IN MATERIALS, MACHINERY, ELECTRONICS II, 2018, 1955