A Sequence Based Dynamic SOM Model for Text Clustering

被引:0
|
作者
Gunasinghe, Upuli [1 ]
Matharage, Sumith [1 ]
Alahakoon, Damminda [1 ]
机构
[1] Monash Univ, Fac IT, CCSL, Clayton, Vic 3800, Australia
关键词
Text clustering; Sequence learning; Growing Self Organizing Map; Text feature selection; Semantics;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text clustering can be considered as a four step process consisting of feature extraction, text representation, document clustering and cluster interpretation. Most text clustering models consider text as an unordered collection of words. However the semantics of text would be better captured if word sequences are taken into account. In this paper we propose a sequence based text clustering model where four novel sequence based components are introduced in each of the four steps in the text clustering process. Experiments conducted on the Reuters dataset and Sydney Morning Herald (SMH) news archives demonstrate the advantage of the proposed sequence based model, in terms of capturing context with semantics, accuracy and speed, compared to clustering of documents based on single words and n-gram based models.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Dynamic clustering for short text stream based on Dirichlet process
    Wanyin Xu
    Yun Li
    Jipeng Qiang
    [J]. Applied Intelligence, 2022, 52 : 4651 - 4662
  • [22] A Bayesian Failure Prediction Network Based on Text Sequence Mining and Clustering
    Chang, Wenbing
    Xu, Zhenzhong
    You, Meng
    Zhou, Shenghan
    Xiao, Yiyong
    Cheng, Yang
    [J]. ENTROPY, 2018, 20 (12):
  • [23] Dynamic clustering for short text stream based on Dirichlet process
    Xu, Wanyin
    Li, Yun
    Qiang, Jipeng
    [J]. APPLIED INTELLIGENCE, 2022, 52 (04) : 4651 - 4662
  • [24] Dynamic topology and relevance learning SOM-based algorithm for image clustering tasks
    Medeiros, Heitor R.
    de Oliveira, Felipe D. B.
    Bassani, Hansenclever F.
    Araujo, Aluizio F. R.
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 179 : 19 - 30
  • [25] A model-based approach to sequence clustering
    Binsztok, H
    Artières, T
    Gallinari, P
    [J]. ECAI 2004: 16TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 110 : 420 - 424
  • [26] Summarization of Text Clustering based Vector Space Model
    Chen, Mingzhen
    Song, Yu
    [J]. 2009 IEEE 10TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED INDUSTRIAL DESIGN & CONCEPTUAL DESIGN, VOLS 1-3: E-BUSINESS, CREATIVE DESIGN, MANUFACTURING - CAID&CD'2009, 2009, : 2362 - 2365
  • [27] A Wikipedia-based Semantic Model for Text Clustering
    Zhou, Jing-min
    Cui, Qing-jun
    Zhang, Hui
    [J]. 2011 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER SCIENCE AND APPLICATION (FCSA 2011), VOL 2, 2011, : 413 - 416
  • [28] The research on text clustering based on LDA joint model
    Li, Chen
    Yang, Cheng
    Jiang, Qin
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2017, 32 (05) : 3655 - 3667
  • [29] Model-based Clustering of Short Text Streams
    Yin, Jianhua
    Chao, Daren
    Liu, Zhongkun
    Zhang, Wei
    Yu, Xiaohui
    Wang, Jianyong
    [J]. KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 2634 - 2642
  • [30] A dynamic adaptive self-organising hybrid model for text clustering
    Hung, C
    Wermter, S
    [J]. THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 75 - 82