Estimating the number of clusters in a ranking data context

被引：3

作者：

Calmon, Wilson ^{[1
]}

Albi, Mariana ^{[1
]}

机构：

[1] Fluminense Fed Univ, Inst Math & Stat, BR-24210201 Niteroi, RJ, Brazil

来源：

INFORMATION SCIENCES | 2021年 / 546卷

关键词：

Number of clusters; Ranking data; Plackett-Luce; Clustering; Ordinal classification;

D O I：

10.1016/j.ins.2020.09.056

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This study introduces two methods for estimating the number of clusters specially designed to identify the number of groups in a finite population of objects or items ranked by several judges under the assumption that these judges belong to a homogeneous population. The proposed methods are both based on a hierarchical version of the classical Plackett-Luce model in which the number of clusters is set as an additional parameter. These methods do not require continuous score data to be available or restrict the number of clusters to be greater than one or less than the total number of objects, thereby enabling their application in a wide range of scenarios. The results of a large simulation study suggest that the proposed methods outperform well-established methodologies (Calinski & Harabasz, gap, Hartigan, Krzanowski & Lai, jump, and silhouette) as well as some recently proposed approaches (instability, quantization error modeling, slope, and utility). They realize the highest percentages of correct estimates of the number of clusters and the smallest errors compared with these well-established methodologies. We illustrate the proposed methods by analyzing a ranking dataset obtained from Formula One motor racing. (c) 2020 Elsevier Inc. All rights reserved.

引用

页码：977 / 995

页数：19

共 50 条

[1] Estimating the number of clusters in DNA microarray data
Bolshakova, N
Azuaje, F
[J]. METHODS OF INFORMATION IN MEDICINE, 2006, 45 (02) : 153 - 157
[2] Estimating the number of clusters
Cuevas, A
Febrero, M
Fraiman, R
[J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2000, 28 (02): : 367 - 382
[3] Estimating the number of clusters in a data set via the gap statistic
Tibshirani, R
Walther, G
Hastie, T
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2001, 63 : 411 - 423
[4] A hybrid method for estimating the predominant number of clusters in a data set
Al Shaqsi, Jamil
Wang, Wenjia
[J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 569 - 573
[5] A preprocessing data-driven pipeline for estimating number of clusters
Koren, Michal
Peretz, Or
Koren, Oded
[J]. Engineering Applications of Artificial Intelligence, 2025, 141
[6] Estimating the Optimal Number of Clusters k in a Dataset Using Data Depth
Patil, Channamma
Baidari, Ishwar
[J]. DATA SCIENCE AND ENGINEERING, 2019, 4 (02) : 132 - 140
[7] A Multicriteria Decision Making Approach for Estimating the Number of Clusters in a Data Set
Peng, Yi
Zhang, Yong
Kou, Gang
Shi, Yong
[J]. PLOS ONE, 2012, 7 (07):
[8] Sequential clustering with particle filters - Estimating the number of clusters from data
Schubert, J
Sidenbladh, H
[J]. 2005 7TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), VOLS 1 AND 2, 2005, : 122 - 129
[9] Estimating the Optimal Number of Clusters k in a Dataset Using Data Depth
Channamma Patil
Ishwar Baidari
[J]. Data Science and Engineering, 2019, 4 : 132 - 140
[10] Estimating the Optimal Number of Clusters in Categorical Data Clustering by Silhouette Coefficient
Dinh, Duy-Tai
Fujinami, Tsutomu
Huynh, Van-Nam
[J]. KNOWLEDGE AND SYSTEMS SCIENCES, KSS 2019, 2019, 1103 : 1 - 17

← 1 2 3 4 5 →