Hierarchical document clustering using frequent itemsets

被引:0
|
作者
Fung, BCM [1 ]
Wang, K [1 ]
Ester, M [1 ]
机构
[1] Simon Fraser Univ, Burnaby, BC V5A 1S6, Canada
关键词
document clustering; text documents; frequent itemsets;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A major challenge in document clustering is the extremely high dimensionality. For example, the vocabulary for a document set can easily be thousands of words. On the other hand, each document often contains a small fraction of words in the vocabulary. These features require special handlings. Another requirement is hierarchical clustering where clustered documents can be browsed according to the increasing specificity of topics. In this paper, we propose to use the notion of frequent itemsets, which comes from association rule mining, for document clustering. The intuition of our clustering criterion is that each cluster is identified by some common words, called frequent itemsets, for the documents in the cluster. Frequent itemsets are also used to produce a hierarchical topic tree for clusters. By focusing on frequent items, the dimensionality of the document set is drastically reduced. We show that this method outperforms best existing methods in terms of both clustering accuracy and scalability.
引用
收藏
页码:59 / 70
页数:12
相关论文
共 50 条
  • [1] Mining fuzzy frequent itemsets for hierarchical document clustering
    Chen, Chun-Ling
    Tseng, Frank S. C.
    Liang, Tyne
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2010, 46 (02) : 193 - 211
  • [2] Hierarchical document clustering using frequent closed sets
    Kryszkiewicz, Marzena
    Skonieczny, Lukasz
    [J]. INTELLIGENT INFORMATION PROCESSING AND WEB MINING, PROCEEDINGS, 2006, : 489 - +
  • [3] Text clustering using frequent itemsets
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    Wang, Qing
    [J]. KNOWLEDGE-BASED SYSTEMS, 2010, 23 (05) : 379 - 388
  • [4] High quality, efficient hierarchical document clustering using closed interesting itemsets
    Malik, Hassan H.
    Kender, John R.
    [J]. ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 991 - +
  • [5] Text Clustering Using Frequent Weighted Utility Itemsets
    Tram Tran
    Bay Vo
    Tho Thi Ngoc Le
    Ngoc Thanh Nguyen
    [J]. CYBERNETICS AND SYSTEMS, 2017, 48 (03) : 193 - 209
  • [6] Frequent Itemset Based Hierarchical Document Clustering Using Wikipedia as External Knowledge
    Kiran, G. V. R.
    Shankar, Ravi
    Pudi, Vikram
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT II, 2010, 6277 : 11 - 20
  • [7] Clustering Frequent Itemsets Based on Generators
    Li, Jinhong
    Yang, Bingru
    Song, Wei
    Hou, Wei
    [J]. 2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL II, PROCEEDINGS, 2008, : 1083 - +
  • [8] Approximate Frequent Itemsets Compression Using Dynamic Clustering Method
    Yan, Hua
    Sang, Yongsheng
    [J]. 2008 IEEE CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2008, : 1110 - 1115
  • [9] Hierarchical document clustering using local patterns
    Malik, Hassan H.
    Kender, John R.
    Fradkin, Dmitriy
    Moerchen, Fabian
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 21 (01) : 153 - 185
  • [10] Hierarchical document clustering using local patterns
    Hassan H. Malik
    John R. Kender
    Dmitriy Fradkin
    Fabian Moerchen
    [J]. Data Mining and Knowledge Discovery, 2010, 21 : 153 - 185