Mining fuzzy frequent itemsets for hierarchical document clustering

被引:34
|
作者
Chen, Chun-Ling [1 ]
Tseng, Frank S. C. [2 ]
Liang, Tyne [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Comp Sci, Hsinchu 300, Taiwan
[2] Natl Kaohsiung First Univ Sci & Technol, Dept Informat Management, Yenchao 824, Kaoshiung, Taiwan
关键词
Fuzzy association rule mining; Text mining; Hierarchical document clustering; Frequent itemsets;
D O I
10.1016/j.ipm.2009.09.009
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As text documents are explosively increasing in the Internet, the process of hierarchical document clustering has been proven to be useful for grouping similar documents for versatile applications. However, most document clustering methods still suffer from challenges in dealing with the problems of high dimensionality, scalability, accuracy, and meaningful cluster labels. In this paper, we will present an effective Fuzzy Frequent Item-set-Based Hierarchical Clustering ((FIHC)-I-2) approach, which uses fuzzy association rule mining algorithm to improve the clustering accuracy of Frequent Item-set-Based Hierarchical Clustering (FIHC) method, In our approach, the key terms will be extracted from the document set, and each document is pre-processed into the designated representation for the following mining process. Then, a fuzzy association rule mining algorithm for text is employed to discover a set of highly-related fuzzy frequent itemsets, which contain key terms to be regarded as the labels of the candidate clusters. Finally, these documents will be clustered into a hierarchical cluster tree by referring to these candidate clusters. We have conducted experiments to evaluate the performance based on Classic4, Hitech, ReO, Reuters, and Wap datasets. The experimental results show that our approach not only absolutely retains the merits of FIHC, but also improves the accuracy quality of FIHC. Crown Copyright (C) 2009 Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:193 / 211
页数:19
相关论文
共 50 条
  • [31] Mining maximal frequent itemsets with frequent pattern list
    Qian, Jin
    Ye, Feiyue
    FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 1, PROCEEDINGS, 2007, : 628 - 632
  • [32] High quality, efficient hierarchical document clustering using closed interesting itemsets
    Malik, Hassan H.
    Kender, John R.
    ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 991 - +
  • [33] Parallel algorithm for mining frequent itemsets
    Ruan, YL
    Liu, G
    Li, QH
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 2118 - 2121
  • [34] On Maximal Frequent Itemsets Mining with Constraints
    Jabbour, Said
    Mana, Fatima Ezzahra
    Dlala, Imen Ouled
    Raddaoui, Badran
    Sais, Lakhdar
    PRINCIPLES AND PRACTICE OF CONSTRAINT PROGRAMMING, 2018, 11008 : 554 - 569
  • [35] An Efficient Approach for Incremental Mining Fuzzy Frequent Itemsets with FP-Tree
    Huo, Weigang
    Fang, Xingjie
    Zhang, Zhiyuan
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2016, 24 (03) : 367 - 386
  • [36] Mining frequent itemsets with convertible constraints
    Pei, J
    Han, JW
    Lakshmanan, LVS
    17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, : 433 - 442
  • [37] An Algorithm for Mining Frequent Closed Itemsets
    Zhang Tiejun
    Yang Junrui
    Wang Xiuqin
    2008 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEM AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2008, : 240 - +
  • [38] The research of sampling for mining frequent itemsets
    Hu, Xuegang
    Yu, Haitao
    ROUGH SETS AND KNOWLEDGE TECHNOLOGY, PROCEEDINGS, 2006, 4062 : 496 - 501
  • [39] Mining Frequent Itemsets in Real Time
    Nath, Nilanjana Dev
    Meena, M. Janaki
    Ibrahim, S. P. Syed
    PROCEEDINGS OF THE 3RD INTERNATIONAL SYMPOSIUM ON BIG DATA AND CLOUD COMPUTING CHALLENGES (ISBCC - 16'), 2016, 49 : 325 - 334
  • [40] An Algorithm of Mining Closed Frequent Itemsets
    Li, Haifeng
    PROCEEDINGS OF THE 2015 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCES AND AUTOMATION ENGINEERING, 2016, 42 : 95 - 98