Hypergraph based clustering for document similarity using FP growth algorithm

被引：0

作者：

Ramakrishnan, Nayana ^{[1
]}

Nair, Meenakshi J. ^{[1
]}

Jayaprakash, Deepak ^{[1
]}

Ananthakrishnan, H. ^{[1
]}

Rani, Siji S. ^{[1
]}

机构：

[1] Amrita Vishwa Vidyapeetham, Dept Comp Sci & Engn, Amritapuri, India

来源：

PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS) | 2019年

关键词：

Hypergraph; Clustering; FP-Growth; Similarity;

D O I：

10.1109/iccs45141.2019.9065630

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Modelling multiple documents for different applications is a major field of research due to the tremendous growth in Web data. To find the document similarity, we require clustering to determine the grouping of unlabelled data. Graph models have the capability or knowledge of capturing the structural information in texts. It organizes high dimensional data in such a way that the user can effortlessly access the desired information. In this paper, we use a hypergraph with the help of an association rule mining to model a collection of text documents and find similarity between them using a hypergraph partitioning algorithm. Here we use FP-Growth algorithm to find the association relationship which is a recursive elimination scheme. We then uses a spectral clustering algorithm which uses eigenvalues and vectors which is found out from the matrices to find similar documents. Experiment shows that this algorithm gave better clusters compared to others which commonly take higher eigenvectors.

引用

页码：332 / 336

页数：5

共 50 条

[41] A Similarity Based Agglomerative Clustering Algorithm in Networks
Liu, Zhiyuan
Wang, Xiujuan
Ma, Yinghong
NINTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2017), 2018, 10615
[42] A Clustering Algorithm Based on Variance-Similarity
Li, Zhendong
Li, Fei
MEASUREMENT TECHNOLOGY AND ENGINEERING RESEARCHES IN INDUSTRY, PTS 1-3, 2013, 333-335 : 1306 - +
[43] WAF-based Document Clustering Algorithm
Luo, Yang
Chen, Guang
Zhang, Yongtian
2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 14 - 16
[44] Analysis of similarity measures with WordNet based text document clustering
Sandhya, Nadella
Govardhan, A.
Advances in Intelligent and Soft Computing, 2012, 132 AISC : 703 - 714
[45] Novel Similarity Measure for Document Clustering Based on Topic Phrases
ELdesoky, A. E.
Saleh, M.
Sakr, N. A.
ICNM: 2009 INTERNATIONAL CONFERENCE ON NETWORKING & MEDIA CONVERGENCE, 2007, : 92 - +
[46] Affinity-based similarity measure for web document clustering
Shyu, ML
Chen, SC
Chen, M
Rubin, SH
PROCEEDINGS OF THE 2004 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI-2004), 2004, : 247 - 252
[47] Analysis of Similarity Measures with WordNet Based Text Document Clustering
Sandhya, Nadella
Govardhan, A.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 703 - +
[48] An algorithm of document refinement based on sentence similarity computation
Ma, Ting
Wang, Daling
Yu, Ge
Hu, Baoshun
Chen, Dongling
Journal of Computational Information Systems, 2007, 3 (05): : 1875 - 1880
[49] Document Clustering Using K-Means with Term Weighting as Similarity-Based Constraints
Buatoom, Uraiwan
Kongprawechnon, Waree
Theeramunkong, Thanaruk
SYMMETRY-BASEL, 2020, 12 (06):
[50] Hypergraph Clustering Based on PageRank
Takai, Yuuki
Miyauchi, Atsushi
Ikeda, Masahiro
Yoshida, Yuichi
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1970 - 1978

← 1 2 3 4 5 →