Data clustering using efficient similarity measures

被引:18
|
作者
Bisandu, Desmond Bala [1 ]
Prasad, Rajesh [2 ]
Liman, Musa Muhammad [3 ]
机构
[1] Univ Jos, Dept Comp Sci, PMB 2084, Jos 930001, Plateau State, Nigeria
[2] African Univ Sci & Technol, Dept Comp Sci, PMB 681 Garki, Abuja Fct, Nigeria
[3] Univ Putra Malaysia, Dept Comp Sci, Serdang 43400, Selangor, Malaysia
关键词
Similarity measure; Document clustering; Text document; Euclidean distance and Edit distance; SELECTION;
D O I
10.1080/09720510.2019.1565443
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The need for appropriate applications of the various similarity measures for clustering has arisen over the years as data massively keep on increasing. The issue of deciding which similarity measure is the best and on what kind of dataset have been a very cumbersome task in the field of data mining, data science, other related fields, and organizations that highly depends on the knowledge outcome from a huge set of data to make some vital / crucial decisions. This is because various datasets portray some common features associated with them; the need for clearer understanding of the various similarity measures for clustering different datasets is needed. This paper presents a critical review of various similarity measures applied in text and data clustering. A theoretical comparison has been made to check the suitability of the measures on different kind of data sets.
引用
收藏
页码:901 / 922
页数:22
相关论文
共 50 条
  • [1] Efficient text document clustering with new similarity measures
    Lakshmi R.
    Baskar S.
    International Journal of Business Intelligence and Data Mining, 2021, 18 (01) : 109 - 126
  • [2] An Analysis of Efficient Clustering Methods for Estimates Similarity Measures
    Jagatheeshkumar, G.
    Brunda, S. Selva
    2017 4TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2017,
  • [3] Clustering of Complex Data-sets using Fractal Similarity Measures and Uncertainties
    Hoecker, Maximilian
    Polsterer, Kai Lars
    Kuegler, Sven Dennis
    Heuveline, Vincent
    2015 IEEE 18TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 2015, : 82 - 91
  • [4] Data Clustering Method based on Mixed Similarity Measures
    Ali, Doaa S.
    Ghoneim, Ayman
    Saleh, Mohamed
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON OPERATIONS RESEARCH AND ENTERPRISE SYSTEMS (ICORES), 2017, : 192 - 199
  • [5] Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering
    Sulc, Zdenek
    Rezankova, Hana
    JOURNAL OF CLASSIFICATION, 2019, 36 (01) : 58 - 72
  • [6] Comparison of Similarity Measures for Categorical Data in Hierarchical Clustering
    Zdeněk Šulc
    Hana Řezanková
    Journal of Classification, 2019, 36 : 58 - 72
  • [7] Fuzzy Clustering of Incomplete Data by Means of Similarity Measures
    Hu, Zhengbing
    Bodyanskiy, Yevgeniy, V
    Tyshchenko, Oleksii K.
    Shafronenko, Alina
    2019 IEEE 2ND UKRAINE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (UKRCON-2019), 2019, : 957 - 960
  • [8] Clustering of Argument Graphs Using Semantic Similarity Measures
    Block, Karsten
    Trumm, Simon
    Sahitaj, Premtim
    Ollinger, Stefan
    Bergmann, Ralph
    ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2019, 2019, 11793 : 101 - 114
  • [9] A Comparison Study on Similarity and Dissimilarity Measures in Clustering Continuous Data
    Shirkhorshidi, Ali Seyed
    Aghabozorgi, Saeed
    Teh Ying Wah
    PLOS ONE, 2015, 10 (12):
  • [10] Similarity Measures for Spatial Clustering
    Hamdad, Leila
    Benatchba, Karima
    Ifrez, Soraya
    Mohguen, Yasmine
    COMPUTATIONAL INTELLIGENCE AND ITS APPLICATIONS, 2018, 522 : 25 - 36