A model for clustering data from heterogeneous dissimilarities

被引:14
|
作者
Santi, Everton [1 ]
Aloise, Daniel [2 ]
Blanchard, Simon J. [3 ]
机构
[1] Univ Fed Rio Grande do Norte, Sch Sci & Technol, BR-59072970 Natal, RN, Brazil
[2] Univ Fed Rio Grande do Norte, Dept Comp Engn & Automat, BR-59072970 Natal, RN, Brazil
[3] Georgetown Univ, McDonough Sch Business, Washington, DC 20057 USA
关键词
Data mining; Clustering; Heterogeneity; Optimization; Heuristics; VARIABLE NEIGHBORHOOD SEARCH; P-MEDIAN PROBLEM; CONSUMER; CONTEXT; BRANCH; REPRESENTATIONS; SUBSTITUTION; RELAXATIONS; PREFERENCE; ALGORITHM;
D O I
10.1016/j.ejor.2016.03.033
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
Clustering algorithms partition a set of n objects into p groups (called clusters), such that objects assigned to the same groups are homogeneous according to some criteria. To derive these clusters, the data input required is often a single n x n dissimilarity matrix. Yet for many applications, more than one instance of the dissimilarity matrix is available and so to conform to model requirements, it is common practice to aggregate (e.g., sum up, average) the matrices. This aggregation practice results in clustering solutions that mask the true nature of the original data. In this paper we introduce a clustering model which, to handle the heterogeneity, uses all available dissimilarity matrices and identifies for groups of individuals clustering objects in a similar way. The model is a nonconvex problem and difficult to solve exactly, and we thus introduce a Variable Neighborhood Search heuristic to provide solutions efficiently. Computational experiments and an empirical application to perception of chocolate candy show that the heuristic algorithm is efficient and that the proposed model is suited for recovering heterogeneous data. Implications for clustering researchers are discussed. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:659 / 672
页数:14
相关论文
共 50 条
  • [31] Clustering Heterogeneous Web Data Using Clustering by Compression. Cluster Validity
    Cernian, Alexandra
    Carstoiu, Dorin
    Olteanu, Adriana
    PROCEEDINGS OF THE 10TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING, 2009, : 123 - 126
  • [32] A Data Model for Heterogeneous Data Sources
    Chirathamjaree, Chaiyaporn
    PROCEEDINGS OF THE ICEBE 2008: IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING, 2008, : 121 - 127
  • [33] Measuring Similarity of Complex and Heterogeneous Data in Clustering of Large Data Sets
    Bacelar-Nicolau, Helena
    Nicolau, Fernando
    Sousa, Aurga
    Bacelar-Nicolau, Leonor
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2009, 29 (02) : 9 - 18
  • [34] Collective of algorithms with weights for clustering heterogeneous data.
    Berikov, Vladimir B.
    VESTNIK TOMSKOGO GOSUDARSTVENNOGO UNIVERSITETA-UPRAVLENIE VYCHISLITELNAJA TEHNIKA I INFORMATIKA-TOMSK STATE UNIVERSITY JOURNAL OF CONTROL AND COMPUTER SCIENCE, 2013, 23 (02): : 22 - 31
  • [35] Clustering Heterogeneous Data with Mutual Semi-supervision
    Abdullin, Artur
    Nasraoui, Olfa
    STRING PROCESSING AND INFORMATION RETRIEVAL: 19TH INTERNATIONAL SYMPOSIUM, SPIRE 2012, 2012, 7608 : 18 - 29
  • [36] Clustering on hierarchical heterogeneous data with prior pairwise relationships
    Han, Wei
    Zhang, Sanguo
    Gao, Hailong
    Bu, Deliang
    BMC BIOINFORMATICS, 2024, 25 (01)
  • [37] Membership-based clustering of heterogeneous fuzzy data
    Herbst, Gernot
    Hempel, Arne-Jens
    Fletling, Rainer
    Bocklisch, Steffen F.
    PROCEEDINGS OF THE 7TH CONFERENCE OF THE EUROPEAN SOCIETY FOR FUZZY LOGIC AND TECHNOLOGY (EUSFLAT-2011) AND LFA-2011, 2011, : 283 - 289
  • [38] Effective Detection of Rare Anomalies from Massive Waveform Data Using Heterogeneous Clustering
    Goto, Masaharu
    Chikamatsu, Kiyoshi
    Kobayashi, Naoki
    Ren, Gang
    Ogihara, Mitsunori
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 1513 - 1522
  • [39] MMM and MMMSynth: Clustering of heterogeneous tabular data, and synthetic data generation
    Kumari, Chandrani
    Siddharthan, Rahul
    PLOS ONE, 2024, 19 (04):
  • [40] Clustering in video data: Dealing with heterogeneous semantics of features
    Harit, G
    Chaudhury, S
    PATTERN RECOGNITION, 2006, 39 (05) : 789 - 811