Automatic identification of the number of clusters in hierarchical clustering

被引:30
|
作者
Karna, Ashutosh [1 ,2 ]
Gibert, Karina [3 ]
机构
[1] HP Inc, Printing & Digital Mfg 3D, Catalonia, Spain
[2] Univ Politecn Cataluna, BarcelonaTech, Intelligent Data Sci & Artificial Intelligence Re, Catalonia, Spain
[3] Univ Politecn Cataluna, BarcelonaTech, Knowledge Engn & Machine Learning Grp, Intelligent Data Sci & Artificial Intelligence Re, Catalonia, Spain
来源
NEURAL COMPUTING & APPLICATIONS | 2022年 / 34卷 / 01期
关键词
Hierarchical clustering; Calinski-Harabasz index; Scalability; Data Science; 3D Printing; Decision Support; ALGORITHM; PLUS;
D O I
10.1007/s00521-021-05873-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hierarchical clustering is one of the most suitable tools to discover the underlying true structure of a dataset in the case of unsupervised learning where the ground truth is unknown and classical machine learning classifiers are not suitable. In many real applications, it provides a perspective on inner data structure and is preferred to partitional methods. However, determining the resulting number of clusters in hierarchical clustering requires human expertise to deduce this from the dendrogram and this represents a major challenge in making a fully automatic system such as the ones required for decision support in Industry 4.0. This research proposes a general criterion to perform the cut of a dendrogram automatically, by comparing six original criteria based on the Calinski-Harabasz index. The performance of each criterion on 95 real-life dendrograms of different topologies is evaluated against the number of classes proposed by the experts and a winner criterion is determined. This research is framed in a bigger project to build an Intelligent Decision Support system to assess the performance of 3D printers based on sensor data in real-time, although the proposed criteria can be used in other real applications of hierarchical clustering.The methodology is applied to a real-life dataset from the 3D printers and the huge reduction in CPU time is also shown by comparing the CPU time before and after this modification of the entire clustering method. It also reduces the dependability on human-expert to provide the number of clusters by inspecting the dendrogram. Further, such a process allows applying hierarchical clustering in an automatic mode in real-life industrial applications and allows the continuous monitoring of real 3D printers in production, and helps in building an Intelligent Decision Support System to detect operational modes, anomalies, and other behavioral patterns.
引用
收藏
页码:119 / 134
页数:16
相关论文
共 50 条
  • [1] Automatic identification of the number of clusters in hierarchical clustering
    Ashutosh Karna
    Karina Gibert
    [J]. Neural Computing and Applications, 2022, 34 : 119 - 134
  • [2] Hierarchical clustering algorithms with automatic estimation of the number of clusters
    Abe, Ryosuke
    Miyamoto, Sadaaki
    Endo, Yasunori
    Hamasuna, Yukihiro
    [J]. 2017 JOINT 17TH WORLD CONGRESS OF INTERNATIONAL FUZZY SYSTEMS ASSOCIATION AND 9TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (IFSA-SCIS), 2017,
  • [3] Automatic extraction of clusters from hierarchical clustering representations
    Sander, J
    Qin, XJ
    Lu, ZY
    Niu, N
    Kovarsky, A
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2003, 2637 : 75 - 87
  • [4] A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering
    Yunjae Jung
    Haesun Park
    Ding-Zhu Du
    Barry L. Drake
    [J]. Journal of Global Optimization, 2003, 25 (1) : 91 - 111
  • [5] Indexes to Find the Optimal Number of Clusters in a Hierarchical Clustering
    David Martin-Fernandez, Jose
    Maria Luna-Romera, Jose
    Pontes, Beatriz
    Riquelme-Santos, Jose C.
    [J]. 14TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2019), 2020, 950 : 3 - 13
  • [6] A decision criterion for the optimal number of clusters in hierarchical clustering
    Jung, J
    Park, H
    Du, DZ
    Drake, BL
    [J]. JOURNAL OF GLOBAL OPTIMIZATION, 2003, 25 (01) : 91 - 111
  • [7] Distributed Fuzzy Clustering with Automatic Detection of the Number of Clusters
    Vendramin, L.
    Campello, R. J. G. B.
    Coletta, L. F. S.
    Hruschka, E. R.
    [J]. INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2011, 91 : 133 - 140
  • [8] Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms
    Salvador, S
    Chan, P
    [J]. ICTAI 2004: 16TH IEEE INTERNATIONALCONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, : 576 - 584
  • [9] Cross-Clustering: A Partial Clustering Algorithm with Automatic Estimation of the Number of Clusters
    Tellaroli, Paola
    Bazzi, Marco
    Donato, Michele
    Brazzale, Alessandra R.
    Draghici, Sorin
    [J]. PLOS ONE, 2016, 11 (03):
  • [10] Hierarchical Language Identification based on Automatic Language Clustering
    Yin, Bo
    Ambikairajah, Eliathamby
    Chen, Fang
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1217 - 1220