Automatic identification of the number of clusters in hierarchical clustering

被引:34
|
作者
Karna, Ashutosh [1 ,2 ]
Gibert, Karina [3 ]
机构
[1] HP Inc, Printing & Digital Mfg 3D, Catalonia, Spain
[2] Univ Politecn Cataluna, BarcelonaTech, Intelligent Data Sci & Artificial Intelligence Re, Catalonia, Spain
[3] Univ Politecn Cataluna, BarcelonaTech, Knowledge Engn & Machine Learning Grp, Intelligent Data Sci & Artificial Intelligence Re, Catalonia, Spain
来源
NEURAL COMPUTING & APPLICATIONS | 2022年 / 34卷 / 01期
关键词
Hierarchical clustering; Calinski-Harabasz index; Scalability; Data Science; 3D Printing; Decision Support; ALGORITHM; PLUS;
D O I
10.1007/s00521-021-05873-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hierarchical clustering is one of the most suitable tools to discover the underlying true structure of a dataset in the case of unsupervised learning where the ground truth is unknown and classical machine learning classifiers are not suitable. In many real applications, it provides a perspective on inner data structure and is preferred to partitional methods. However, determining the resulting number of clusters in hierarchical clustering requires human expertise to deduce this from the dendrogram and this represents a major challenge in making a fully automatic system such as the ones required for decision support in Industry 4.0. This research proposes a general criterion to perform the cut of a dendrogram automatically, by comparing six original criteria based on the Calinski-Harabasz index. The performance of each criterion on 95 real-life dendrograms of different topologies is evaluated against the number of classes proposed by the experts and a winner criterion is determined. This research is framed in a bigger project to build an Intelligent Decision Support system to assess the performance of 3D printers based on sensor data in real-time, although the proposed criteria can be used in other real applications of hierarchical clustering.The methodology is applied to a real-life dataset from the 3D printers and the huge reduction in CPU time is also shown by comparing the CPU time before and after this modification of the entire clustering method. It also reduces the dependability on human-expert to provide the number of clusters by inspecting the dendrogram. Further, such a process allows applying hierarchical clustering in an automatic mode in real-life industrial applications and allows the continuous monitoring of real 3D printers in production, and helps in building an Intelligent Decision Support System to detect operational modes, anomalies, and other behavioral patterns.
引用
收藏
页码:119 / 134
页数:16
相关论文
共 50 条
  • [21] Automatic Determination of the Number of Clusters for Semi-Supervised Relational Fuzzy Clustering
    Fantoukh, Norah Ibrahim
    Ben Ismail, Mohamed Maher
    Bchir, Ouiem
    INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2020, 20 (02) : 156 - 167
  • [22] Laboratory evaluation of a fully automatic modal identification algorithm using automatic hierarchical clustering approach
    Zonno, Giacomo
    Aguilar, Rafael
    Castaneda, Benjamin
    Boroschek, Ruben
    Lourenco, Paulo B.
    X INTERNATIONAL CONFERENCE ON STRUCTURAL DYNAMICS (EURODYN 2017), 2017, 199 : 882 - 887
  • [23] Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members
    Sadeghian, Azam
    Fazeli, Seyed Abolfazl Shahzadeh
    Karbassi, Seyed Mehdi
    IRANIAN JOURNAL OF MATHEMATICAL SCIENCES AND INFORMATICS, 2021, 16 (01): : 105 - 121
  • [24] A novel fuzzy clustering approach to regionalise watersheds with an automatic determination of optimal number of clusters
    Senent-Aparicio, Javier
    Soto, Jesus
    Perez-Sanchez, Julio
    Garrido, Jorge
    JOURNAL OF HYDROLOGY AND HYDROMECHANICS, 2017, 65 (04) : 359 - 365
  • [25] A Novel Approach for Automatic Number of Clusters Detection in Microarray Data based on Consensus Clustering
    Vinh, Nguyen Xuan
    Epps, Julien
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING, 2009, : 84 - 91
  • [26] Automatic selection of the number of clusters using Bayesian clustering and sparsity-inducing priors
    Valle, Denis
    Jameel, Yusuf
    Betancourt, Brenda
    Azeria, Ermias T.
    Attias, Nina
    Cullen, Joshua
    ECOLOGICAL APPLICATIONS, 2022, 32 (03)
  • [27] A Modified Multiobjective EA-based Clustering Algorithm with Automatic Determination of the Number of Clusters
    Tsai, Chun-Wei
    Chen, Wen-Ling
    Chiang, Ming-Chao
    PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 2833 - 2838
  • [28] Automatic stochastic subspace identification of modal parameters based on hierarchical clustering method
    Tang, B.-P., 1600, Chinese Vibration Engineering Society (31):
  • [29] Speaker identification based oin subtractive clustering algorithm with estimating number of clusters
    Lee, Y
    Lee, KY
    Rheem, J
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2005, 3658 : 249 - 256
  • [30] A Heuristic Automatic Clustering Method Based on Hierarchical Clustering
    LaPlante, Francois
    Belacel, Nabil
    Kardouchi, Mustapha
    AGENTS AND ARTIFICIAL INTELLIGENCE, ICAART 2014, 2015, 8946 : 312 - 328