Data clustering using efficient similarity measures

被引:18
|
作者
Bisandu, Desmond Bala [1 ]
Prasad, Rajesh [2 ]
Liman, Musa Muhammad [3 ]
机构
[1] Univ Jos, Dept Comp Sci, PMB 2084, Jos 930001, Plateau State, Nigeria
[2] African Univ Sci & Technol, Dept Comp Sci, PMB 681 Garki, Abuja Fct, Nigeria
[3] Univ Putra Malaysia, Dept Comp Sci, Serdang 43400, Selangor, Malaysia
关键词
Similarity measure; Document clustering; Text document; Euclidean distance and Edit distance; SELECTION;
D O I
10.1080/09720510.2019.1565443
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The need for appropriate applications of the various similarity measures for clustering has arisen over the years as data massively keep on increasing. The issue of deciding which similarity measure is the best and on what kind of dataset have been a very cumbersome task in the field of data mining, data science, other related fields, and organizations that highly depends on the knowledge outcome from a huge set of data to make some vital / crucial decisions. This is because various datasets portray some common features associated with them; the need for clearer understanding of the various similarity measures for clustering different datasets is needed. This paper presents a critical review of various similarity measures applied in text and data clustering. A theoretical comparison has been made to check the suitability of the measures on different kind of data sets.
引用
收藏
页码:901 / 922
页数:22
相关论文
共 50 条
  • [21] Clustering rule bases using ontology-based similarity measures
    Hassanpour, Saeed
    O'Connor, Martin J.
    Das, Amar K.
    JOURNAL OF WEB SEMANTICS, 2014, 25 : 1 - 8
  • [22] Clustering rule bases using ontology-based similarity measures
    Hassanpour, Saeed
    O'Connor, Martin J.
    Das, Amar K.
    Journal of Web Semantics, 2014, 25 : 1 - 8
  • [23] Data Driven Similarity Measures for k-Means Like Clustering Algorithms
    Jacob Kogan
    Marc Teboulle
    Charles Nicholas
    Information Retrieval, 2005, 8 : 331 - 349
  • [24] Clustering Sinhala News Articles Using Corpus Based Similarity Measures
    Nanayakkara, Purnima
    Ranathunga, Surangika
    2018 MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON) 4TH INTERNATIONAL MULTIDISCIPLINARY ENGINEERING RESEARCH CONFERENCE, 2018, : 437 - 442
  • [25] Clustering of web users using session-based similarity measures
    Xiao, JT
    Zhang, YC
    2001 INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND MOBILE COMPUTING, PROCEEDINGS, 2001, : 223 - 228
  • [26] Clustering rule bases using ontology-based similarity measures
    Hassanpour, Saeed
    O'Connor, Martin J.
    Das, Amar K.
    Journal of Web Semantics, 2014, 25 : 1 - 8
  • [27] An Efficient Inclusive Similarity Based Clustering (ISC) Algorithm for Big Data
    Sangeetha, J.
    Prakash, V. Sinthu Janita
    2017 2ND WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT), 2017, : 84 - 88
  • [28] On efficient network similarity measures
    Dehmer, Matthias
    Chen, Zengqiang
    Shi, Yongtang
    Zhang, Yusen
    Tripathi, Shailesh
    Ghorbani, Modjtaba
    Mowshowitz, Abbe
    Emmert-Streib, Frank
    APPLIED MATHEMATICS AND COMPUTATION, 2019, 362
  • [29] Classification of Categorical Data Using Hybrid Similarity Measures
    Hari, Seetha
    Srividya, V. V. R.
    WIRELESS NETWORKS AND COMPUTATIONAL INTELLIGENCE, ICIP 2012, 2012, 292 : 371 - 377
  • [30] Using Similarity Measures to Select Pretraining Data for NER
    Dai, Xiang
    Karimi, Sarvnaz
    Hachey, Ben
    Paris, Cecile
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1460 - 1470