Topic Word Set-Based Text Clustering

被引:0
|
作者
Ghazifard, Amir Mehdi [1 ]
Shams, Mohammadreza [2 ]
Shamaee, Zeinab [3 ]
机构
[1] Univ Isfahan, E Learning Dept, Esfahan, Iran
[2] Univ Tehran, ECE Dept, Tehran 14174, Iran
[3] Isfahan Univ Technol, ECE Dept, Esfahan, Iran
关键词
e-commerce; clustering; classification; term correlation graph; topic word set;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Clustering is the task of grouping related and similar data without any prior knowledge about the labels. In some real world applications, we face huge amounts of unstructured textual data with no organization. In these situations, clustering is a primitive operation that needs to be done to help future e-commerce tasks. Clustering can be used to enhance different e-commerce applications like recommender systems, customer relationship management systems or personal assistant agents. In this paper we propose a new method for text clustering, by constructing a term correlation graph, and then extracting topic word sets from it and finally, categorizing each document to its related topic with the help of a classification algorithm like SVM. This method provides a natural and understandable description for clusters by their topic word sets, and it also enables us to decide the cluster of documents only when needed and in a parallel fashion, thus significantly reducing the offline processing time. Our clustering method also outperforms the well-known k-means clustering algorithm according to clustering quality measures.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM
    Zhang P.
    Liu D.
    Data Analysis and Knowledge Discovery, 2019, 3 (03) : 95 - 101
  • [42] Effective Text Classification Through Supervised Rough Set-Based Term Weighting
    Cekik, Rasim
    SYMMETRY-BASEL, 2025, 17 (01):
  • [43] A rough set-based CBR approach for feature and document reduction in text categorization
    Li, Y
    Shiu, SCK
    Pal, SK
    Liu, JNK
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 2438 - 2443
  • [44] Automatic Distance Adaption for Dominating Set-based Clustering in Wireless Mesh Networks
    Krebs, Martin
    Stein, Andre
    2010 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC 2010), 2010,
  • [45] CLUSTERING OF DECISION TABLES TOWARD ROUGH SET-BASED GROUP DECISION AID
    Inuiguchi, Masahiro
    Enomoto, Ryuta
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2011, 19 : 17 - 32
  • [46] Rough set-based clustering with refinement using Shannon's entropy theory
    Chen, Chun-Bao
    Wang, Li-Ya
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2006, 52 (10-11) : 1563 - 1576
  • [47] A Connected Dominating Set-based Weighted Clustering Algorithm for Wireless Sensor Networks
    Anitha, V. S.
    Sebastian, M. P.
    2010 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND INFORMATION SECURITY (WCNIS), VOL 2, 2010, : 530 - +
  • [48] Extractive text summarization using clustering-based topic modeling
    Ramesh Chandra Belwal
    Sawan Rai
    Atul Gupta
    Soft Computing, 2023, 27 : 3965 - 3982
  • [49] Extractive text summarization using clustering-based topic modeling
    Belwal, Ramesh Chandra
    Rai, Sawan
    Gupta, Atul
    SOFT COMPUTING, 2023, 27 (07) : 3965 - 3982
  • [50] A Network Decomposition-based Text Clustering Algorithm for Topic Detection
    Meng, Zuqiang
    Shen, Shimo
    Chen, Qiulian
    MEASUREMENT TECHNOLOGY AND ITS APPLICATION, PTS 1 AND 2, 2013, 239-240 : 1318 - 1323