Mining fuzzy frequent itemsets for hierarchical document clustering

被引:34
|
作者
Chen, Chun-Ling [1 ]
Tseng, Frank S. C. [2 ]
Liang, Tyne [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Comp Sci, Hsinchu 300, Taiwan
[2] Natl Kaohsiung First Univ Sci & Technol, Dept Informat Management, Yenchao 824, Kaoshiung, Taiwan
关键词
Fuzzy association rule mining; Text mining; Hierarchical document clustering; Frequent itemsets;
D O I
10.1016/j.ipm.2009.09.009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As text documents are explosively increasing in the Internet, the process of hierarchical document clustering has been proven to be useful for grouping similar documents for versatile applications. However, most document clustering methods still suffer from challenges in dealing with the problems of high dimensionality, scalability, accuracy, and meaningful cluster labels. In this paper, we will present an effective Fuzzy Frequent Item-set-Based Hierarchical Clustering ((FIHC)-I-2) approach, which uses fuzzy association rule mining algorithm to improve the clustering accuracy of Frequent Item-set-Based Hierarchical Clustering (FIHC) method, In our approach, the key terms will be extracted from the document set, and each document is pre-processed into the designated representation for the following mining process. Then, a fuzzy association rule mining algorithm for text is employed to discover a set of highly-related fuzzy frequent itemsets, which contain key terms to be regarded as the labels of the candidate clusters. Finally, these documents will be clustered into a hierarchical cluster tree by referring to these candidate clusters. We have conducted experiments to evaluate the performance based on Classic4, Hitech, ReO, Reuters, and Wap datasets. The experimental results show that our approach not only absolutely retains the merits of FIHC, but also improves the accuracy quality of FIHC. Crown Copyright (C) 2009 Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:193 / 211
页数:19
相关论文
共 50 条
  • [41] Text clustering using frequent itemsets
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    Wang, Qing
    KNOWLEDGE-BASED SYSTEMS, 2010, 23 (05) : 379 - 388
  • [42] Efficient frequent itemsets mining by sampling
    Zhao, Yanchang
    Zhang, Chengqi
    Zhang, Shichao
    ADVANCES IN INTELLIGENT IT: ACTIVE MEDIA TECHNOLOGY 2006, 2006, 138 : 112 - +
  • [43] A Hybrid Method for Frequent Itemsets Mining
    Chen, Fuzan
    Li, Minqiang
    2008 IEEE SYMPOSIUM ON ADVANCED MANAGEMENT OF INFORMATION FOR GLOBALIZED ENTERPRISES, PROCEEDINGS, 2008, : 91 - 95
  • [44] A Filtering Approach for Mining Frequent Itemsets
    Huang, Jen-Peng
    Kuo, Huang-Cheng
    OPPORTUNITIES AND CHALLENGES FOR NEXT-GENERATION APPLIED INTELLIGENCE, 2009, 214 : 59 - 63
  • [45] Efficient mining frequent itemsets algorithms
    Marghny H. Mohamed
    Mohammed M. Darwieesh
    International Journal of Machine Learning and Cybernetics, 2014, 5 : 823 - 833
  • [46] Fast Mining and Updating Frequent Itemsets
    Liu, Chaohui
    An, Jiancheng
    2008 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL 1, PROCEEDINGS, 2008, : 365 - +
  • [47] A decomposition approach for mining frequent itemsets
    Huang, Jen-Peng
    Lan, Guo-Cheng
    Ku, Huang-Cheng
    Hong, Tzung-Pei
    2007 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL II, PROCEEDINGS, 2007, : 605 - +
  • [48] Fast mining maximum frequent itemsets
    Lu, S.F.
    Lu, Z.D.
    Ruan Jian Xue Bao/Journal of Software, 2001, 12 (02): : 293 - 297
  • [49] A Combination Approach to Frequent Itemsets Mining
    Sahaphong, Supatra
    Boonjing, Veera
    Third 2008 International Conference on Convergence and Hybrid Information Technology, Vol 1, Proceedings, 2008, : 565 - 570
  • [50] Efficient Mining of Fuzzy Frequent Itemsets with Type-2 Membership Functions
    Lin, Jerry Chun-Wei
    Lv, Xianbiao
    Fournier-Viger, Philippe
    Wu, Tsu-Yang
    Hong, Tzung-Pei
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2016, PT II, 2016, 9622 : 191 - 200