A model for clustering data from heterogeneous dissimilarities

被引:14
|
作者
Santi, Everton [1 ]
Aloise, Daniel [2 ]
Blanchard, Simon J. [3 ]
机构
[1] Univ Fed Rio Grande do Norte, Sch Sci & Technol, BR-59072970 Natal, RN, Brazil
[2] Univ Fed Rio Grande do Norte, Dept Comp Engn & Automat, BR-59072970 Natal, RN, Brazil
[3] Georgetown Univ, McDonough Sch Business, Washington, DC 20057 USA
关键词
Data mining; Clustering; Heterogeneity; Optimization; Heuristics; VARIABLE NEIGHBORHOOD SEARCH; P-MEDIAN PROBLEM; CONSUMER; CONTEXT; BRANCH; REPRESENTATIONS; SUBSTITUTION; RELAXATIONS; PREFERENCE; ALGORITHM;
D O I
10.1016/j.ejor.2016.03.033
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
Clustering algorithms partition a set of n objects into p groups (called clusters), such that objects assigned to the same groups are homogeneous according to some criteria. To derive these clusters, the data input required is often a single n x n dissimilarity matrix. Yet for many applications, more than one instance of the dissimilarity matrix is available and so to conform to model requirements, it is common practice to aggregate (e.g., sum up, average) the matrices. This aggregation practice results in clustering solutions that mask the true nature of the original data. In this paper we introduce a clustering model which, to handle the heterogeneity, uses all available dissimilarity matrices and identifies for groups of individuals clustering objects in a similar way. The model is a nonconvex problem and difficult to solve exactly, and we thus introduce a Variable Neighborhood Search heuristic to provide solutions efficiently. Computational experiments and an empirical application to perception of chocolate candy show that the heuristic algorithm is efficient and that the proposed model is suited for recovering heterogeneous data. Implications for clustering researchers are discussed. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:659 / 672
页数:14
相关论文
共 50 条
  • [1] A Semi-supervised Clustering Algorithm that Integrates Heterogeneous Dissimilarities and Data Sources
    Martin-Merino, Manuel
    2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 1732 - 1739
  • [2] Semi-supervised Clustering Using Heterogeneous Dissimilarities
    Martin-Merino, Manuel
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2010, 6218 : 375 - 384
  • [3] Model-based clustering with dissimilarities: A Bayesian approach
    Oh, Man-Suk
    Raftery, Adrian E.
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2007, 16 (03) : 559 - 585
  • [4] Parsimonious Bayesian model-based clustering with dissimilarities
    Morrissette, Samuel
    Muthukumarana, Saman
    Turgeon, Maxime
    MACHINE LEARNING WITH APPLICATIONS, 2024, 15
  • [5] A Bayesian sparse finite mixture model for clustering data from a heterogeneous population
    Saraiva, Erlandson F.
    Suzuki, Adriano K.
    Milan, Luis A.
    BRAZILIAN JOURNAL OF PROBABILITY AND STATISTICS, 2020, 34 (02) : 323 - 344
  • [6] ON CLUSTERING HETEROGENEOUS DATA AND CLUSTERING BY COMPRESSION
    Carstoiu, Dorin
    Cernian, Alexandra
    Sgarciu, Valentin
    Olteanu, Adriana
    ANNALS OF DAAAM FOR 2009 & PROCEEDINGS OF THE 20TH INTERNATIONAL DAAAM SYMPOSIUM, 2009, 20 : 293 - 294
  • [7] A Model Selection Algorithm For Mixture Model Clustering Of Heterogeneous Multivariate Data
    Erol, Hamza
    2013 IEEE INTERNATIONAL SYMPOSIUM ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (IEEE INISTA), 2013,
  • [8] Learning a Combination of Heterogeneous Dissimilarities from Incomplete Knowledge
    Martin-Merino, Manuel
    ARTIFICIAL NEURAL NETWORKS (ICANN 2010), PT III, 2010, 6354 : 62 - 71
  • [9] PARTIAL DISSIMILARITIES WITH APPLICATION TO CLUSTERING
    BROSSIER, G
    JOURNAL OF CLASSIFICATION, 1994, 11 (01) : 37 - 58
  • [10] Collective, Hierarchical Clustering from distributed, heterogeneous data
    Johnson, EL
    Kargupta, H
    LARGE-SCALE PARALLEL DATA MINING, 2000, 1759 : 221 - 244