Hypergraph based clustering for document similarity using FP growth algorithm

被引:0
|
作者
Ramakrishnan, Nayana [1 ]
Nair, Meenakshi J. [1 ]
Jayaprakash, Deepak [1 ]
Ananthakrishnan, H. [1 ]
Rani, Siji S. [1 ]
机构
[1] Amrita Vishwa Vidyapeetham, Dept Comp Sci & Engn, Amritapuri, India
来源
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS) | 2019年
关键词
Hypergraph; Clustering; FP-Growth; Similarity;
D O I
10.1109/iccs45141.2019.9065630
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modelling multiple documents for different applications is a major field of research due to the tremendous growth in Web data. To find the document similarity, we require clustering to determine the grouping of unlabelled data. Graph models have the capability or knowledge of capturing the structural information in texts. It organizes high dimensional data in such a way that the user can effortlessly access the desired information. In this paper, we use a hypergraph with the help of an association rule mining to model a collection of text documents and find similarity between them using a hypergraph partitioning algorithm. Here we use FP-Growth algorithm to find the association relationship which is a recursive elimination scheme. We then uses a spectral clustering algorithm which uses eigenvalues and vectors which is found out from the matrices to find similar documents. Experiment shows that this algorithm gave better clusters compared to others which commonly take higher eigenvectors.
引用
收藏
页码:332 / 336
页数:5
相关论文
共 50 条
  • [41] A Similarity Based Agglomerative Clustering Algorithm in Networks
    Liu, Zhiyuan
    Wang, Xiujuan
    Ma, Yinghong
    NINTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2017), 2018, 10615
  • [42] A Clustering Algorithm Based on Variance-Similarity
    Li, Zhendong
    Li, Fei
    MEASUREMENT TECHNOLOGY AND ENGINEERING RESEARCHES IN INDUSTRY, PTS 1-3, 2013, 333-335 : 1306 - +
  • [43] WAF-based Document Clustering Algorithm
    Luo, Yang
    Chen, Guang
    Zhang, Yongtian
    2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 14 - 16
  • [44] Analysis of similarity measures with WordNet based text document clustering
    Sandhya, Nadella
    Govardhan, A.
    Advances in Intelligent and Soft Computing, 2012, 132 AISC : 703 - 714
  • [45] Novel Similarity Measure for Document Clustering Based on Topic Phrases
    ELdesoky, A. E.
    Saleh, M.
    Sakr, N. A.
    ICNM: 2009 INTERNATIONAL CONFERENCE ON NETWORKING & MEDIA CONVERGENCE, 2007, : 92 - +
  • [46] Affinity-based similarity measure for web document clustering
    Shyu, ML
    Chen, SC
    Chen, M
    Rubin, SH
    PROCEEDINGS OF THE 2004 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI-2004), 2004, : 247 - 252
  • [47] Analysis of Similarity Measures with WordNet Based Text Document Clustering
    Sandhya, Nadella
    Govardhan, A.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 703 - +
  • [48] An algorithm of document refinement based on sentence similarity computation
    Ma, Ting
    Wang, Daling
    Yu, Ge
    Hu, Baoshun
    Chen, Dongling
    Journal of Computational Information Systems, 2007, 3 (05): : 1875 - 1880
  • [49] Document Clustering Using K-Means with Term Weighting as Similarity-Based Constraints
    Buatoom, Uraiwan
    Kongprawechnon, Waree
    Theeramunkong, Thanaruk
    SYMMETRY-BASEL, 2020, 12 (06):
  • [50] Hypergraph Clustering Based on PageRank
    Takai, Yuuki
    Miyauchi, Atsushi
    Ikeda, Masahiro
    Yoshida, Yuichi
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1970 - 1978