Hierarchical Document Clustering based on Cosine Similarity measure

被引:0
|
作者
Popat, Shraddha K. [1 ]
Deshmukh, Pramod B. [1 ]
Metre, Vishakha A. [1 ]
机构
[1] DY Patil Coll Engn, Pune, Maharashtra, India
关键词
Cluster; Document cluster; Similarity;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is one of the prime topics in data mining. Clustering partitions the data and classifies the data into meaningful subgroups. Document clustering is a set of the document into groups such that two groups show different characteristics with respect to likeness. In this paper, an experimental exploration of similarity based method, HSC for measuring the similarity between data objects particularly text documents is introduced. It also provides an algorithm which has an incremental approach and evaluates cluster likeness between documents that leads to much improved results over other traditional methods. It also focuses on the selection of appropriate similarity measure for analyzing similarity between the documents.
引用
收藏
页码:153 / 159
页数:7
相关论文
共 50 条
  • [41] Learning a concept-based document similarity measure
    Huang, Lan
    Milne, David
    Frank, Eibe
    Witten, Ian H.
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2012, 63 (08): : 1593 - 1608
  • [42] Hierarchical Clustering Algorithm Based on a New Measure
    Zhang Guofen
    Ye Jianjun
    COMPREHENSIVE EVALUATION OF ECONOMY AND SOCIETY WITH STATISTICAL SCIENCE, 2009, : 1026 - 1030
  • [43] Document Visual Similarity Measure For Document Search
    Ahmadullin, Ildus
    Allebach, Jan P.
    Damera-Venkata, Niranjan
    Fan, Jian
    Lee, Seungyon
    Lin, Qian
    Liu, Jerry
    DOCENG 2011: PROCEEDINGS OF THE 2011 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2011, : 139 - 142
  • [44] Sentiment Analysis using Cosine Similarity Measure
    Bhattacharjee, Saprativa
    Das, Anirban
    Bhattacharya, Ujjwal
    Parui, Swapan K.
    Roy, Sudipta
    2015 IEEE 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION SYSTEMS (RETIS), 2015, : 27 - 32
  • [45] Impact of Similarity Measure on the Quality of Communities Detected in Social Network by Hierarchical Clustering
    Szyman, Pawel
    Barbucha, Dariusz
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022, 2022, 13501 : 29 - 42
  • [46] Unsupervised multistage image classification using hierarchical clustering with a Bayesian similarity measure
    Lee, S
    Crawford, MM
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2005, 14 (03) : 312 - 320
  • [47] A heuristic hierarchical clustering based on multiple similarity measurements
    Li, Chun-Zhong
    Xu, Zong-Ben
    Luo, Tao
    PATTERN RECOGNITION LETTERS, 2013, 34 (02) : 155 - 162
  • [48] Hierarchical clustering of heavy-tailed data using a new similarity measure
    Seidpisheh, Mohammad
    Mohammadpour, Adel
    INTELLIGENT DATA ANALYSIS, 2018, 22 (03) : 569 - 579
  • [49] Similarity Based Hierarchical Clustering with an Application to Text Collections
    Ah-Pine, Julien
    Wang, Xinyu
    ADVANCES IN INTELLIGENT DATA ANALYSIS XV, 2016, 9897 : 320 - 331
  • [50] Classifying DDoS attacks by Hierarchical Clustering based on similarity
    Kang, Jian
    Zhang, Yuan
    Ju, Jiu-Bin
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 2712 - +