A new text clustering method using hidden Markov model

被引:0
|
作者
Fu, Yan [1 ]
Yang, Dongqing [1 ]
Tang, Shiwei [2 ]
Wang, Tengjiao [1 ]
Gao, Aiqiang [1 ]
机构
[1] Peking Univ, Sch Elect Engn & Comp Sci, Beijing 100871, Peoples R China
[2] Peking Univ, Natl Lab Machine Percept, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Being high-dimensional and relevant in semantics, text clustering is still an important topic in data mining. However, little work has been done to investigate attributes of clustering process, and previous studies just focused on characteristics of text itself. As a dynamic and sequential process, we aim to describe text clustering as state transitions for words or documents. Taking K-means clustering method as example, we try to parse the clustering process into several sequences. Based on research of sequential and temporal data clustering, we propose a new text clustering method using HMM(Hidden Markov Model). And through the experiments on Reuters-21578, the results show that this approach provides an accurate clustering partition, and achieves better performance rates compared with K-means algorithm.
引用
收藏
页码:73 / +
页数:3
相关论文
共 50 条
  • [1] A new text representation method for clustering based on higher order Markov model
    Yang, Weifeng
    Han, Guosheng
    Xie, Xiaoqiang
    [J]. 2ND INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND DATA MINING (ICISDM 2018), 2018, : 1 - 6
  • [2] A new-arabic-text classification system using a Hidden Markov Model
    Kechaou, Zied
    Kanoun, Slim
    [J]. INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED AND INTELLIGENT ENGINEERING SYSTEMS, 2014, 18 (04) : 201 - 210
  • [3] CONNECTED AND DEGRADED TEXT RECOGNITION USING HIDDEN MARKOV MODEL
    BOSE, CB
    KUO, SS
    [J]. PATTERN RECOGNITION, 1994, 27 (10) : 1345 - 1363
  • [4] Text mining for medical documents using a Hidden Markov Model
    Jang, Hyeju
    Song, Sa Kwang
    Myaeng, Sung Hyon
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2006, 4182 : 553 - 559
  • [5] Diacritizing Arabic Text Using a Single Hidden Markov Model
    Khorsheed, Mohammad S.
    [J]. IEEE ACCESS, 2018, 6 : 36522 - 36529
  • [6] Clustering sequence data using hidden Markov model representation
    Li, C
    Biswas, G
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS, AND TECHNOLOGY, 1999, 3695 : 14 - 21
  • [7] Handwritten Text Recognition In Odia Script Using Hidden Markov Model
    Bhoi, Suman
    Dogra, D. P.
    Roy, P. P.
    [J]. 2015 FIFTH NATIONAL CONFERENCE ON COMPUTER VISION, PATTERN RECOGNITION, IMAGE PROCESSING AND GRAPHICS (NCVPRIPG), 2015,
  • [8] Improving clustering with hidden Markov models using Bayesian model selection
    Li, C
    Biswas, G
    [J]. SMC 2000 CONFERENCE PROCEEDINGS: 2000 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOL 1-5, 2000, : 194 - 199
  • [9] Clustering with Hidden Markov Model on Variable Blocks
    Lin, Lin
    Li, Jia
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [10] Subspace distribution clustering hidden Markov model
    Bocchieri, E
    Mak, BKW
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (03): : 264 - 275