Text Document Clustering Using Community Discovery Approach

被引:2
|
作者
Beniwal, Anu [1 ]
Roy, Gourav [1 ]
Bhavani, S. Durga [1 ]
机构
[1] Univ Hyderabad, Hyderabad, India
关键词
Social networks; Louvain community discovery algorithm; Clustering;
D O I
10.1007/978-3-030-36987-3_22
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of document clustering is about automatic grouping of text documents into groups containing similar documents. This problem under supervised setting yields good results whereas for unannotated data the unsupervised machine learning approach does not yield good results always. Algorithms like K-Means clustering are most popular when the class labels are not known. The objective of this work is to apply community discovery algorithms from the literature of social network analysis to detect the underlying groups in the text data. We model the corpus of documents as a graph with distinct non-trivial words from the whole corpus considered as nodes and an edge is added between two nodes if the corresponding word nodes occur together in at least one common document. Edge weight between two word nodes is defined as the number of documents in which those two words cooccur together. We apply the fast Louvain community discovery algorithm to detect communities. The challenge is to interpret the communities as classes. If the number of communities obtained is greater than the required number of classes, a technique for merging is proposed. The community which has the maximum number of similar words with a document is assigned as the community for that document. The main thrust of the paper is to show a novel approach to document clustering using community discovery algorithms. The proposed algorithm is evaluated on a few bench mark data sets and we find that our algorithm gives competitive results on majority of the data sets when compared to the standard clustering algorithms.
引用
收藏
页码:336 / 346
页数:11
相关论文
共 50 条
  • [1] A hybrid approach for text document clustering using Jaya optimization algorithm
    Thirumoorthy, Karpagalingam
    Muneeswaran, Karuppaiah
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 178
  • [2] An ensemble clustering approach for topic discovery using implicit text segmentation
    Memon, Muhammad Qasim
    Lu, Yu
    Chen, Penghe
    Memon, Aasma
    Pathan, Muhammad Salman
    Zardari, Zulfiqar Ali
    [J]. JOURNAL OF INFORMATION SCIENCE, 2021, 47 (04) : 431 - 457
  • [3] Text document clustering using semantic neighbors
    Young Researchers Club, Jouybar Branch, Islamic Azad University, Jouybar, Iran
    [J]. J. Softw. Eng., 4 (136-144):
  • [4] Sentence Clustering in Text Document Using Fuzzy Clustering Algorithm
    Sruthi, S.
    Shalini, L.
    [J]. 2014 INTERNATIONAL CONFERENCE ON CONTROL, INSTRUMENTATION, COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICCICCT), 2014, : 1473 - 1476
  • [5] Text Document Clustering on the basis of Inter passage approach by using K-means
    Mishra, Rupesh Kumar
    Saini, Kanika
    Bagri, Sakshi
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION & AUTOMATION (ICCCA), 2015, : 110 - 113
  • [6] Frequent Term Based Text Document Clustering Using Similarity Measures: A Novel Approach
    Gupta, Vijay Kumar
    Dutta, Maitreyee
    Kumar, Manoj
    [J]. 2017 FOURTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP), 2017, : 164 - 169
  • [7] Text Document Clustering Approach by Improved Sine Cosine Algorithm
    Radomirovic, Branislav
    Jovanovic, Vuk
    Nikolic, Bosko
    Stojanovic, Sasa
    Venkatachalam, K.
    Zivkovic, Miodrag
    Njegus, Angelina
    Bacanin, Nebojsa
    Strumberger, Ivana
    [J]. INFORMATION TECHNOLOGY AND CONTROL, 2023, 52 (02): : 541 - 561
  • [8] Frequent Term Based Text Document Clustering: A New Approach
    Kumar, Manoj
    Yadav, D. K.
    Gupta, Vijay Kumar
    [J]. 2015 INTERNATIONAL CONFERENCE ON SOFT COMPUTING TECHNIQUES AND IMPLEMENTATIONS (ICSCTI), 2015,
  • [9] Efficient text document clustering approach using multi-search Arithmetic Optimization Algorithm
    Abualigah, Laith
    Almotairi, Khaled H.
    Al-qaness, Mohammed A. A.
    Ewees, Ahmed A.
    Yousri, Dalia
    Abd Elaziz, Mohamed
    Nadimi-Shahraki, Mohammad H.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 248
  • [10] An effective implementation of Social Spider Optimization for text document clustering using single cluster approach
    Chandran, T. Ravi
    Reddy, A. V.
    Janet, B.
    [J]. PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICICCT), 2018, : 508 - 511