Semantic smoothing for model-based document clustering

被引:0
|
作者
Zhang, Xiaodan [1 ]
Zhou, Xiaohua [1 ]
Hu, Xiaohua [1 ]
机构
[1] Drexel Univ, Coll Informat Sci & Technol, Philadelphia, PA 19104 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A document is often full of class-independent "general" words and short of class-specific "core" words, which leads to the difficulty of document clustering. We argue that both problems will be relieved after suitable smoothing of document models in agglomerative approaches and of cluster models in partitional approaches, and hence improve clustering quality. To the best of our knowledge, most model-based clustering approaches use Laplacian smoothing to prevent zero probability while most similarity-based approaches employ the heuristic TF*IDF scheme to discount the effect of "general" words. Inspired by a series of statistical translation language model for text retrieval, we propose in this paper a novel smoothing method referred to as context-sensitive semantic smoothing for document clustering purpose. The comparative experiment on three datasets shows that model-based clustering approaches with semantic smoothing is effective in improving cluster quality.
引用
收藏
页码:1193 / +
页数:2
相关论文
共 50 条
  • [1] An improved semantic smoothing model for model-based document clustering
    Cai, Jiarong
    Liu, Yubao
    Yin, Jian
    [J]. SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 3, PROCEEDINGS, 2007, : 670 - +
  • [2] Document clustering based on semantic smoothing approach
    Liu, Yubao
    Cai, Jiarong
    Yin, Jian
    Huang, Zhilan
    [J]. ADVANCES IN INTELLIGENT WEB MASTERING, 2007, 43 : 217 - +
  • [3] Semantic Smoothing of Document Models for Agglomerative Clustering
    Zhou, Xiaohua
    Zhang, Xiaodan
    Hu, Xiaohua
    [J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2922 - 2927
  • [4] Model-based document categorization employing semantic pattern analysis and local structure clustering
    Fume, Kosei
    Ishitani, Yasuto
    [J]. DOCUMENT RECOGNITION AND RETRIEVAL XV, 2008, 6815
  • [5] Generative model-based document clustering: a comparative study
    Zhong, S
    Ghosh, J
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2005, 8 (03) : 374 - 384
  • [6] Generative model-based document clustering: a comparative study
    Shi Zhong
    Joydeep Ghosh
    [J]. Knowledge and Information Systems, 2005, 8 : 374 - 384
  • [7] Research on Mixture Language Model-based Document Clustering
    Wen, Jian
    Li, Zhoujun
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2008, : 649 - +
  • [8] MMPClust: A skew prevention algorithm for model-based document clustering
    Li, XG
    Yu, G
    Wang, DL
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2005, 3453 : 536 - 547
  • [9] ESPClust: An effective skew prevention method for model-based document clustering
    Li, XG
    Yu, G
    Wang, DL
    Bao, YB
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 735 - 745
  • [10] Semi-supervised model-based document clustering: A comparative study
    Zhong, Shi
    [J]. MACHINE LEARNING, 2006, 65 (01) : 3 - 29