Data clustering using efficient similarity measures

被引:18
|
作者
Bisandu, Desmond Bala [1 ]
Prasad, Rajesh [2 ]
Liman, Musa Muhammad [3 ]
机构
[1] Univ Jos, Dept Comp Sci, PMB 2084, Jos 930001, Plateau State, Nigeria
[2] African Univ Sci & Technol, Dept Comp Sci, PMB 681 Garki, Abuja Fct, Nigeria
[3] Univ Putra Malaysia, Dept Comp Sci, Serdang 43400, Selangor, Malaysia
关键词
Similarity measure; Document clustering; Text document; Euclidean distance and Edit distance; SELECTION;
D O I
10.1080/09720510.2019.1565443
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
The need for appropriate applications of the various similarity measures for clustering has arisen over the years as data massively keep on increasing. The issue of deciding which similarity measure is the best and on what kind of dataset have been a very cumbersome task in the field of data mining, data science, other related fields, and organizations that highly depends on the knowledge outcome from a huge set of data to make some vital / crucial decisions. This is because various datasets portray some common features associated with them; the need for clearer understanding of the various similarity measures for clustering different datasets is needed. This paper presents a critical review of various similarity measures applied in text and data clustering. A theoretical comparison has been made to check the suitability of the measures on different kind of data sets.
引用
收藏
页码:901 / 922
页数:22
相关论文
共 50 条
  • [31] Combination of molecular similarity measures using data fusion
    Ginn, CMR
    Willett, P
    Bradshaw, J
    PERSPECTIVES IN DRUG DISCOVERY AND DESIGN, 2000, 20 (01) : 1 - 16
  • [32] Representation is Everything: Towards Efficient and Adaptable Similarity Measures for Biological Data
    Aggarwal, Charu C.
    PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, : 210 - 221
  • [33] Using similarity measures for an efficient business information-exchange
    Malucelli, A
    Oliveira, E
    2005 IEEE/WIC/ACM International Conference on Intelligent Agent Technology, Proceedings, 2005, : 234 - 237
  • [34] Comparison of similarity measures for clustering Turkish documents
    Madylova, Ainura
    Oguducu, Sule Guenduez
    INTELLIGENT DATA ANALYSIS, 2009, 13 (05) : 815 - 832
  • [35] EVALUATION OF POLSAR SIMILARITY MEASURES WITH SPECTRAL CLUSTERING
    Hu, Jingliang
    Wang, Yuanyuan
    Ghamisi, Pedram
    Zhu, Xiao Xiang
    2017 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2017, : 3254 - 3257
  • [36] Comparison of Similarity Measures in Context of Rules Clustering
    Nowak-Brzezinska, Agnieszka
    Rybotycki, Tomasz
    2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (INISTA), 2017, : 235 - 240
  • [37] Comparison of similarity measures for clustering electrocardiogram complexes
    Chang, KC
    Lee, RG
    Wen, C
    Yeh, MF
    COMPUTERS IN CARDIOLOGY 2005, VOL 32, 2005, 32 : 759 - 762
  • [38] Comparison of different similarity measures in hierarchical clustering
    Vagni, Marica
    Giordano, Noemi
    Balestra, Gabriella
    Rosati, Samanta
    2021 IEEE INTERNATIONAL SYMPOSIUM ON MEDICAL MEASUREMENTS AND APPLICATIONS (IEEE MEMEA 2021), 2021,
  • [39] Stemming and similarity measures for Arabic Documents Clustering
    L.T.T.I, University Sidi Mohamed Ben Abdellah , Fez, Morocco
    不详
    不详
    Int. Symp. I/V Commun. Mob. Networks, ISIVC,
  • [40] Comparative Analysis of Similarity Measures in Document Clustering
    Karun, Kavitha A.
    Philip, Mintu
    Lubna, K.
    2013 INTERNATIONAL CONFERENCE ON GREEN COMPUTING, COMMUNICATION AND CONSERVATION OF ENERGY (ICGCE), 2013, : 857 - 860