Topic Word Set-Based Text Clustering

被引:0
|
作者
Ghazifard, Amir Mehdi [1 ]
Shams, Mohammadreza [2 ]
Shamaee, Zeinab [3 ]
机构
[1] Univ Isfahan, E Learning Dept, Esfahan, Iran
[2] Univ Tehran, ECE Dept, Tehran 14174, Iran
[3] Isfahan Univ Technol, ECE Dept, Esfahan, Iran
关键词
e-commerce; clustering; classification; term correlation graph; topic word set;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Clustering is the task of grouping related and similar data without any prior knowledge about the labels. In some real world applications, we face huge amounts of unstructured textual data with no organization. In these situations, clustering is a primitive operation that needs to be done to help future e-commerce tasks. Clustering can be used to enhance different e-commerce applications like recommender systems, customer relationship management systems or personal assistant agents. In this paper we propose a new method for text clustering, by constructing a term correlation graph, and then extracting topic word sets from it and finally, categorizing each document to its related topic with the help of a classification algorithm like SVM. This method provides a natural and understandable description for clusters by their topic word sets, and it also enables us to decide the cluster of documents only when needed and in a parallel fashion, thus significantly reducing the offline processing time. Our clustering method also outperforms the well-known k-means clustering algorithm according to clustering quality measures.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Short Text Embedding for Clustering based on Word and Topic Semantic Information
    Chen, Ziheng
    Ren, Jiangtao
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 61 - 70
  • [2] A rough set-based fuzzy clustering
    Zhao, YQ
    Zhou, XZ
    Tang, GZ
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2005, 3689 : 401 - 409
  • [3] A rough set-based hybrid feature selection method for topic-specific text filtering
    Li, Q
    Li, JH
    Liu, GS
    Li, SH
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1464 - 1468
  • [4] A rough set-based approach to text classification
    Chouchoulas, A
    Shen, Q
    [J]. NEW DIRECTIONS IN ROUGH SETS, DATA MINING, AND GRANULAR-SOFT COMPUTING, 1999, 1711 : 118 - 127
  • [5] Partition for the rough set-based text classification
    Bao, YG
    Asai, D
    Du, XY
    Ishii, N
    [J]. ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2003, 2762 : 181 - 188
  • [6] Fuzzy Rough Set-Based Unstructured Text Categorization
    Bharadwaj, Aditya
    Ramanna, Sheela
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2017, 2017, 10233 : 335 - 340
  • [7] A rough set-based hybrid method to text categorization
    Bao, Y
    Aoyama, S
    Du, XY
    Yamada, K
    Ishii, N
    [J]. SECOND INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, VOL I, PROCEEDINGS, 2002, : 254 - 261
  • [8] An effective rough set-based method for text classification
    Bao, YG
    Asai, D
    Du, XY
    Yamada, K
    Ishii, N
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, 2003, 2690 : 545 - 552
  • [9] Rough Set-based SVM Classifier for Text Categorization
    Chen, Peng
    Liu, Shuang
    [J]. ICNC 2008: FOURTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 2, PROCEEDINGS, 2008, : 153 - +
  • [10] Rough Set-Based Clustering Utilizing Probabilistic Memberships
    Ubukata, Seiki
    Kato, Hiroki
    Notsu, Akira
    Honda, Katsuhiro
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2018, 22 (06) : 956 - 964