Hierarchical document clustering using frequent itemsets

被引：0

作者：

Fung, BCM ^{[1
]}

Wang, K ^{[1
]}

Ester, M ^{[1
]}

机构：

[1] Simon Fraser Univ, Burnaby, BC V5A 1S6, Canada

来源：

PROCEEDINGS OF THE THIRD SIAM INTERNATIONAL CONFERENCE ON DATA MINING | 2003年

关键词：

document clustering; text documents; frequent itemsets;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A major challenge in document clustering is the extremely high dimensionality. For example, the vocabulary for a document set can easily be thousands of words. On the other hand, each document often contains a small fraction of words in the vocabulary. These features require special handlings. Another requirement is hierarchical clustering where clustered documents can be browsed according to the increasing specificity of topics. In this paper, we propose to use the notion of frequent itemsets, which comes from association rule mining, for document clustering. The intuition of our clustering criterion is that each cluster is identified by some common words, called frequent itemsets, for the documents in the cluster. Frequent itemsets are also used to produce a hierarchical topic tree for clusters. By focusing on frequent items, the dimensionality of the document set is drastically reduced. We show that this method outperforms best existing methods in terms of both clustering accuracy and scalability.

引用

页码：59 / 70

页数：12

共 50 条

[1] Mining fuzzy frequent itemsets for hierarchical document clustering
Chen, Chun-Ling
Tseng, Frank S. C.
Liang, Tyne
[J]. INFORMATION PROCESSING & MANAGEMENT, 2010, 46 (02) : 193 - 211
[2] Hierarchical document clustering using frequent closed sets
Kryszkiewicz, Marzena
Skonieczny, Lukasz
[J]. INTELLIGENT INFORMATION PROCESSING AND WEB MINING, PROCEEDINGS, 2006, : 489 - +
[3] Text clustering using frequent itemsets
Zhang, Wen
Yoshida, Taketoshi
Tang, Xijin
Wang, Qing
[J]. KNOWLEDGE-BASED SYSTEMS, 2010, 23 (05) : 379 - 388
[4] High quality, efficient hierarchical document clustering using closed interesting itemsets
Malik, Hassan H.
Kender, John R.
[J]. ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 991 - +
[5] Text Clustering Using Frequent Weighted Utility Itemsets
Tram Tran
Bay Vo
Tho Thi Ngoc Le
Ngoc Thanh Nguyen
[J]. CYBERNETICS AND SYSTEMS, 2017, 48 (03) : 193 - 209
[6] Frequent Itemset Based Hierarchical Document Clustering Using Wikipedia as External Knowledge
Kiran, G. V. R.
Shankar, Ravi
Pudi, Vikram
[J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT II, 2010, 6277 : 11 - 20
[7] Clustering Frequent Itemsets Based on Generators
Li, Jinhong
Yang, Bingru
Song, Wei
Hou, Wei
[J]. 2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL II, PROCEEDINGS, 2008, : 1083 - +
[8] Approximate Frequent Itemsets Compression Using Dynamic Clustering Method
Yan, Hua
Sang, Yongsheng
[J]. 2008 IEEE CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2008, : 1110 - 1115
[9] Hierarchical document clustering using local patterns
Malik, Hassan H.
Kender, John R.
Fradkin, Dmitriy
Moerchen, Fabian
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 21 (01) : 153 - 185
[10] Hierarchical document clustering using local patterns
Hassan H. Malik
John R. Kender
Dmitriy Fradkin
Fabian Moerchen
[J]. Data Mining and Knowledge Discovery, 2010, 21 : 153 - 185

← 1 2 3 4 5 →