Silhouette Analysis for Performance Evaluation in Machine Learning with Applications to Clustering

被引:108
|
作者
Shutaywi, Meshal [1 ]
Kachouie, Nezamoddin N. [2 ]
机构
[1] King Abdulaziz Univ, Dept Math, Rabigh 21911, Saudi Arabia
[2] Florida Inst Technol, Dept Math Sci, Melbourne, FL 32901 USA
关键词
k-means; kernel k-means; machine learning; nonlinear clustering; silhouette index; weighted clustering;
D O I
10.3390/e23060759
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Grouping the objects based on their similarities is an important common task in machine learning applications. Many clustering methods have been developed, among them k-means based clustering methods have been broadly used and several extensions have been developed to improve the original k-means clustering method such as k-means ++ and kernel k-means. K-means is a linear clustering method; that is, it divides the objects into linearly separable groups, while kernel k-means is a non-linear technique. Kernel k-means projects the elements to a higher dimensional feature space using a kernel function, and then groups them. Different kernel functions may not perform similarly in clustering of a data set and, in turn, choosing the right kernel for an application could be challenging. In our previous work, we introduced a weighted majority voting method for clustering based on normalized mutual information (NMI). NMI is a supervised method where the true labels for a training set are required to calculate NMI. In this study, we extend our previous work of aggregating the clustering results to develop an unsupervised weighting function where a training set is not available. The proposed weighting function here is based on Silhouette index, as an unsupervised criterion. As a result, a training set is not required to calculate Silhouette index. This makes our new method more sensible in terms of clustering concept.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Analysis of Clustering Algorithms in Machine Learning for Healthcare Data
    Zhang J.
    Zhong H.
    Journal of Commercial Biotechnology, 2022, 27 (05) : 82 - 91
  • [22] Comparative Analysis of Machine Learning Clustering Methods for Electroretinogram
    Zhdanov, Aleksei
    Bulev, Daniil
    Dolganov, Anton
    Kulyabin, Mikhail
    ADVANCES IN DIGITAL HEALTH AND MEDICAL BIOENGINEERING, VOL 1, EHB-2023, 2024, 109 : 385 - 392
  • [23] Selecting Appropriate Clustering Methods for Materials Science Applications of Machine Learning
    Parker, Amanda J.
    Barnard, Amanda S.
    ADVANCED THEORY AND SIMULATIONS, 2019, 2 (12)
  • [24] An unsupervised discriminative extreme learning machine and its applications to data clustering
    Peng, Yong
    Zheng, Wei-Long
    Lu, Bao-Liang
    NEUROCOMPUTING, 2016, 174 : 250 - 264
  • [25] A Performance Evaluation of Queueing Systems by Machine Learning
    Niii, Suguru
    Okuda, Takashi
    Wakita, Takuya
    2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TAIWAN), 2020,
  • [26] Reproducibility, Transparency and Evaluation of Machine Learning in Health Applications
    Wojtusiak, Janusz
    HEALTHINF: PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES - VOL. 5: HEALTHINF, 2021, : 685 - 692
  • [27] Evaluation of Machine Learning for Intrusion Detection in Microservice Applications
    Araujo, Iury
    Antunes, Nuno
    Vieira, Marco
    PROCEEDINGS OF12TH LATIN-AMERICAN SYMPOSIUM ON DEPENDABLE AND SECURE COMPUTING, LADC 2023, 2023, : 126 - 135
  • [28] Evaluation of a decided sample size in machine learning applications
    Rajput, Daniyal
    Wang, Wei-Jen
    Chen, Chun-Chuan
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [29] Machine Learning for Performance Prediction of Spark Cloud Applications
    Maros, Alexandre
    Murai, Fabricio
    Couto da Silva, Ana Paula
    Almeida, Jussara M.
    Lattuada, Marco
    Gianniti, Eugenio
    Hosseini, Marjan
    Ardagna, Danilo
    2019 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (IEEE CLOUD 2019), 2019, : 99 - 106
  • [30] Evaluation of a decided sample size in machine learning applications
    Daniyal Rajput
    Wei-Jen Wang
    Chun-Chuan Chen
    BMC Bioinformatics, 24