Topic Word Set-Based Text Clustering

被引:0
|
作者
Ghazifard, Amir Mehdi [1 ]
Shams, Mohammadreza [2 ]
Shamaee, Zeinab [3 ]
机构
[1] Univ Isfahan, E Learning Dept, Esfahan, Iran
[2] Univ Tehran, ECE Dept, Tehran 14174, Iran
[3] Isfahan Univ Technol, ECE Dept, Esfahan, Iran
关键词
e-commerce; clustering; classification; term correlation graph; topic word set;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Clustering is the task of grouping related and similar data without any prior knowledge about the labels. In some real world applications, we face huge amounts of unstructured textual data with no organization. In these situations, clustering is a primitive operation that needs to be done to help future e-commerce tasks. Clustering can be used to enhance different e-commerce applications like recommender systems, customer relationship management systems or personal assistant agents. In this paper we propose a new method for text clustering, by constructing a term correlation graph, and then extracting topic word sets from it and finally, categorizing each document to its related topic with the help of a classification algorithm like SVM. This method provides a natural and understandable description for clusters by their topic word sets, and it also enables us to decide the cluster of documents only when needed and in a parallel fashion, thus significantly reducing the offline processing time. Our clustering method also outperforms the well-known k-means clustering algorithm according to clustering quality measures.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Feature Word Vector Based on Short Text Clustering
    Liu, Xin
    Wang, Bo
    Xi, Yao-yi
    Mao, Er-song
    Ke, Sheng-cai
    Tang, Yong-wang
    COMPUTER SCIENCE AND TECHNOLOGY (CST2016), 2017, : 533 - 545
  • [32] Instant Set-Based Design, an Easy Path to Set-Based Design
    Strom, Mikael
    Raudberget, Dag
    Gustafsson, Goran
    26TH CIRP DESIGN CONFERENCE, 2016, 50 : 234 - 239
  • [33] Set-based Bayesianism
    Kyburg, HE
    Pittarelli, M
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 1996, 26 (03): : 324 - 339
  • [34] Topic modeling and intuitionistic fuzzy set-based approach for efficient software bug triaging
    Rama Ranjan Panda
    Naresh Kumar Nagwani
    Knowledge and Information Systems, 2022, 64 : 3081 - 3111
  • [35] Topic modeling and intuitionistic fuzzy set-based approach for efficient software bug triaging
    Panda, Rama Ranjan
    Nagwani, Naresh Kumar
    KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (11) : 3081 - 3111
  • [36] Short Text Clustering based on Word Semantic Graph with Word Embedding Model
    Jinarat, Supakpong
    Manaskasemsak, Bundit
    Rungsawang, Arnon
    2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 1427 - 1432
  • [37] Probabilistic topic modeling for short text based on word embedding networks
    Pita, Marcelo
    Nunes, Matheus
    Pappa, Gisele L.
    APPLIED INTELLIGENCE, 2022, 52 (15) : 17829 - 17844
  • [38] Fuzzy Set Based Clustering Algorithm of Web Text
    Wan, Hongxin
    Peng, Yun
    ADVANCES IN MECHATRONICS AND CONTROL ENGINEERING III, 2014, 678 : 19 - +
  • [39] A Topic Recognition Method of News Text Based on Word Embedding Enhancement
    Du, Qiming
    Li, Nan
    Liu, Wenfu
    Sun, Daozhu
    Yang, Shudan
    Yue, Feng
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [40] Probabilistic topic modeling for short text based on word embedding networks
    Marcelo Pita
    Matheus Nunes
    Gisele L. Pappa
    Applied Intelligence, 2022, 52 : 17829 - 17844