Text clustering algorithm based on lexical graph

被引:1
|
作者
Sha, Yun [1 ]
Zhang, Guoying [1 ]
Jiang, Huina [1 ]
机构
[1] Beijing Inst Petrochem Technol, Dept Comp Sci, Beijing 102617, Peoples R China
关键词
D O I
10.1109/FSKD.2007.560
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text clustering methods can group text into thematic clusters, which is an important topic in many fields, such as search engine. The well-known methods of text clustering, however, do not really address the special problems of text clustering because of the very high dimensionality data and understandability of the cluster description. An algorithm for text clustering based on lexical graph is proposed in this paper, which is a kind of term-based cluster method. The lexical graph is build with nodes representing words and edges representing their concurrent in text. The attribute of each node is text which the word occurs in. A cluster center is defined as node (word) with large degree in this graph, the center attributes (text occurs in) and its neighbors' are partitioned to one cluster whose description is the center node. This approach reduces drastically the dimensionality of the data and improves the synonymy extension ability. An experimental evaluation on web documents as well as classical text documents on demonstrates that the proposed algorithms obtain clustering of comparable quality significantly more efficiently than K-Means and STC algorithms on the search results data set. Furthermore, this method provides an understandable description of the discovered clusters by their center.
引用
收藏
页码:277 / 281
页数:5
相关论文
共 50 条
  • [1] Graph based AHC Algorithm for Text Clustering
    Jo, Taeho
    [J]. PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 309 - 314
  • [2] Text Clustering Algorithm Based on Spectral Graph Seriation
    Guo Wensheng
    Li Guohe
    [J]. CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, : 4255 - 4259
  • [3] Text Clustering Algorithm Based on Semantic Graph Structure
    Bai, Qiuchan
    Jin, Chunxia
    [J]. PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2016, : 312 - 316
  • [4] Short-Text Clustering Algorithm Based on Laplacian Graph
    Meng, Hai-Ning
    Feng, Kai
    Zhu, Lei
    Zhang, Bei-Bei
    Tong, Xin-Yu
    Hei, Xin-Hong
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2021, 49 (09): : 1716 - 1723
  • [5] The Refinement Algorithm Consideration in Text Clustering Scheme Based on Multilevel Graph
    CHEN Jian-bin 1
    2. Modern Education Technology and Information Center
    3. Department of Computer Science and Technology
    [J]. Wuhan University Journal of Natural Sciences, 2004, (05) : 671 - 675
  • [6] Extractive Text Summarization Using Lexical Association and Graph Based Text Analysis
    Krishna, R. V. V. Murali
    Reddy, Ch. Satyananda
    [J]. COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 1, CIDM 2015, 2016, 410 : 261 - 272
  • [7] Text Clustering Algorithm Based on the Graph Structures of Semantic Word Co-occurrence
    Jin, Chun-Xia
    Bai, Qiu-Chan
    [J]. 2016 INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND ARTIFICIAL INTELLIGENCE (ISAI 2016), 2016, : 497 - 502
  • [8] Text clustering based on kernel KNN clustering algorithm
    Xiong, Hao
    Sun, Sheng
    Feng, Yunfang
    [J]. International Journal of Applied Mathematics and Statistics, 2013, 46 (16): : 69 - 75
  • [9] A clustering algorithm based on graph connectivity
    Hartuv, E
    Shamir, R
    [J]. INFORMATION PROCESSING LETTERS, 2000, 76 (4-6) : 175 - 181
  • [10] Graph Clustering: a graph-based clustering algorithm for the electromagnetic calorimeter in LHCb
    Canudas, Nuria Valls
    Gomez, Miriam Calvo
    Vilasis-Cardona, Xavier
    Ribe, Elisabet Golobardes
    [J]. EUROPEAN PHYSICAL JOURNAL C, 2023, 83 (02):