Mining fuzzy frequent itemsets for hierarchical document clustering

被引:34
|
作者
Chen, Chun-Ling [1 ]
Tseng, Frank S. C. [2 ]
Liang, Tyne [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Comp Sci, Hsinchu 300, Taiwan
[2] Natl Kaohsiung First Univ Sci & Technol, Dept Informat Management, Yenchao 824, Kaoshiung, Taiwan
关键词
Fuzzy association rule mining; Text mining; Hierarchical document clustering; Frequent itemsets;
D O I
10.1016/j.ipm.2009.09.009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As text documents are explosively increasing in the Internet, the process of hierarchical document clustering has been proven to be useful for grouping similar documents for versatile applications. However, most document clustering methods still suffer from challenges in dealing with the problems of high dimensionality, scalability, accuracy, and meaningful cluster labels. In this paper, we will present an effective Fuzzy Frequent Item-set-Based Hierarchical Clustering ((FIHC)-I-2) approach, which uses fuzzy association rule mining algorithm to improve the clustering accuracy of Frequent Item-set-Based Hierarchical Clustering (FIHC) method, In our approach, the key terms will be extracted from the document set, and each document is pre-processed into the designated representation for the following mining process. Then, a fuzzy association rule mining algorithm for text is employed to discover a set of highly-related fuzzy frequent itemsets, which contain key terms to be regarded as the labels of the candidate clusters. Finally, these documents will be clustered into a hierarchical cluster tree by referring to these candidate clusters. We have conducted experiments to evaluate the performance based on Classic4, Hitech, ReO, Reuters, and Wap datasets. The experimental results show that our approach not only absolutely retains the merits of FIHC, but also improves the accuracy quality of FIHC. Crown Copyright (C) 2009 Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:193 / 211
页数:19
相关论文
共 50 条
  • [21] A Novel Fuzzy Frequent Itemsets Mining Approach for the Detection of Breast Cancer
    Dhanaseelan, Ramesh F.
    Jeyasutha, M.
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2021, 11 (01) : 36 - 53
  • [22] Clustering Frequent Itemsets Based on Generators
    Li, Jinhong
    Yang, Bingru
    Song, Wei
    Hou, Wei
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL II, PROCEEDINGS, 2008, : 1083 - +
  • [23] Mining frequent closed itemsets with the frequent pattern list
    Tseng, FC
    Hsu, CC
    Chen, H
    2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 653 - 654
  • [24] Distributed Frequent Closed Itemsets Mining
    Liu, Chun
    Zheng, Zheng
    Cai, Kai-Yuan
    Zhang, Shichao
    SITIS 2007: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGIES & INTERNET BASED SYSTEMS, 2008, : 43 - 50
  • [25] Summary queries for frequent itemsets mining
    Zhang, Shichao
    Jin, Zhi
    Lu, Jingli
    JOURNAL OF SYSTEMS AND SOFTWARE, 2010, 83 (03) : 405 - 411
  • [26] Incremental Frequent Itemsets Mining with MapReduce
    Kandalov, Kirill
    Gudes, Ehud
    ADVANCES IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2017, 2017, 10509 : 247 - 261
  • [27] Mining Frequent and Homogeneous Closed Itemsets
    Hilali, Ines
    Jen, Tao-Yuan
    Laurent, Dominique
    Marinica, Claudia
    Ben Yahia, Sadok
    INFORMATION SEARCH, INTEGRATION AND PERSONALIZATION, ISIP 2014, 2016, 497 : 51 - 65
  • [28] Efficiently mining maximal frequent itemsets
    Gouda, K
    Zaki, MJ
    2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 163 - 170
  • [29] Mining Frequent Weighted Closed Itemsets
    Bay Vo
    Nhu-Y Tran
    Duong-Ha Ngo
    ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING, 2013, 479 : 379 - 390
  • [30] An Improved Algorithm for Frequent Itemsets Mining
    Jiang, Hao
    He, Xu
    2017 FIFTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2017, : 314 - 317