A model for clustering data from heterogeneous dissimilarities

被引：14

作者：

Santi, Everton ^{[1
]}

Aloise, Daniel ^{[2
]}

Blanchard, Simon J. ^{[3
]}

机构：

[1] Univ Fed Rio Grande do Norte, Sch Sci & Technol, BR-59072970 Natal, RN, Brazil

[2] Univ Fed Rio Grande do Norte, Dept Comp Engn & Automat, BR-59072970 Natal, RN, Brazil

[3] Georgetown Univ, McDonough Sch Business, Washington, DC 20057 USA

来源：

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH | 2016年 / 253卷 / 03期

关键词：

Data mining; Clustering; Heterogeneity; Optimization; Heuristics; VARIABLE NEIGHBORHOOD SEARCH; P-MEDIAN PROBLEM; CONSUMER; CONTEXT; BRANCH; REPRESENTATIONS; SUBSTITUTION; RELAXATIONS; PREFERENCE; ALGORITHM;

D O I：

10.1016/j.ejor.2016.03.033

中图分类号：

C93 [管理学];

学科分类号：

12 ; 1201 ; 1202 ; 120202 ;

摘要：

Clustering algorithms partition a set of n objects into p groups (called clusters), such that objects assigned to the same groups are homogeneous according to some criteria. To derive these clusters, the data input required is often a single n x n dissimilarity matrix. Yet for many applications, more than one instance of the dissimilarity matrix is available and so to conform to model requirements, it is common practice to aggregate (e.g., sum up, average) the matrices. This aggregation practice results in clustering solutions that mask the true nature of the original data. In this paper we introduce a clustering model which, to handle the heterogeneity, uses all available dissimilarity matrices and identifies for groups of individuals clustering objects in a similar way. The model is a nonconvex problem and difficult to solve exactly, and we thus introduce a Variable Neighborhood Search heuristic to provide solutions efficiently. Computational experiments and an empirical application to perception of chocolate candy show that the heuristic algorithm is efficient and that the proposed model is suited for recovering heterogeneous data. Implications for clustering researchers are discussed. (C) 2016 Elsevier B.V. All rights reserved.

引用

页码：659 / 672

页数：14

共 50 条

[41] Heterogeneous Distributed Big Data Clustering on Sparse Grids
Pfander, David
Daiss, Gregor
Pflueger, Dirk
ALGORITHMS, 2019, 12 (03)
[42] Clustering on hierarchical heterogeneous data with prior pairwise relationships
Wei Han
Sanguo Zhang
Hailong Gao
Deliang Bu
BMC Bioinformatics, 25
[43] Detecting Quality Problems in Data Models by Clustering Heterogeneous Data Values
Wenz, Viola
Kesper, Arno
Taentzer, Gabriele
24TH ACM/IEEE INTERNATIONAL CONFERENCE ON MODEL-DRIVEN ENGINEERING LANGUAGES AND SYSTEMS COMPANION (MODELS-C 2021), 2021, : 152 - 161
[44] A Model of Clustering Uncertain Data
Yang, Zengfang
Tang, Hewen
CONFERENCE ON WEB BASED BUSINESS MANAGEMENT, VOLS 1-2, 2010, : 969 - 972
[45] Smoothing dissimilarities to cluster binary data
Hitchcock, David B.
Chen, Zhimin
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (10) : 4699 - 4711
[46] A Clustering Algorithm Based on an Ensemble of Dissimilarities: An Application in the Bioinformatics Domain
Martin Merino, Manuel
Lopez Rivero, Alfonso Jose
Alons, Vidal
Vallejo, Marcelo
Ferreras, Antonio
INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2022, 7 (06): : 6 - 13
[47] treeClust: An R Package for Tree-Based Clustering Dissimilarities
Buttrey, Samuel E.
Whitaker, Lyn R.
R JOURNAL, 2015, 7 (02): : 227 - 236
[48] Research on intrusion detection model of heterogeneous attributes clustering
Xie, L. (lq_xie@163.com), 2012, Academy Publisher (07)
[49] Estimation and clustering for partially heterogeneous single index model
Wang, Fangfang
Lin, Lu
Liu, Lei
Wang, Kangning
STATISTICAL PAPERS, 2021, 62 (06) : 2529 - 2556
[50] Estimation and clustering for partially heterogeneous single index model
Fangfang Wang
Lu Lin
Lei Liu
Kangning Wang
Statistical Papers, 2021, 62 : 2529 - 2556

← 1 2 3 4 5 →