A maximal frequent itemset approach for web document clustering

被引:0
|
作者
Zhuang, L [1 ]
Dai, HH [1 ]
机构
[1] Deakin Univ, Sch Informat Technol, Burwood, Vic 3125, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To efficiently and yet accurately cluster web documents is of great interests to web users and is a key component Of the searching accuracy of a web search engine. To achieve this, this paper introduces a new approach for the clustering of web documents, which is called Maximal Frequent Item-set(MFI) approach. Iterative clustering algorithms, such as K-means and Expectation-Maximization(EM), are sensitive to their initial conditions. MFI approach firstly locates the center points of high density clusters precisely. These center points then are used as initial points for the K-means algorithm. Our experimental results tested on 3 web document sets show that our MFI approach outperforms the other methods we compared in most cases, particularly in the case of large number of categories in web document sets.
引用
收藏
页码:970 / 977
页数:8
相关论文
共 50 条
  • [21] Fuzzy Maximal Frequent Itemset Mining Over Quantitative Databases
    Li, Haifeng
    Wang, Yue
    Zhang, Ning
    Zhang, Yuejin
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2017, PT I, 2017, 10191 : 476 - 486
  • [22] An algorithm for mining constrained maximal frequent itemset in uncertain data
    Du, Haizhou
    Journal of Information and Computational Science, 2012, 9 (15): : 4509 - 4515
  • [23] A Theoretical Comparison of Two Maximal Frequent Itemset Mining Algorithms
    Li, Haifeng
    PROCEEDINGS OF THE 2016 5TH INTERNATIONAL CONFERENCE ON ADVANCED MATERIALS AND COMPUTER SCIENCE, 2016, 80 : 363 - 366
  • [24] Frequent Term Based Text Document Clustering: A New Approach
    Kumar, Manoj
    Yadav, D. K.
    Gupta, Vijay Kumar
    2015 INTERNATIONAL CONFERENCE ON SOFT COMPUTING TECHNIQUES AND IMPLEMENTATIONS (ICSCTI), 2015,
  • [25] Maximal Frequent Sequences for Document Classification
    Hai Nguyen Thi Tuyet
    Tan Hanh
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR COMMUNICATIONS (ATC), 2016, : 152 - 157
  • [26] SiBIC: A Web Server for Generating Gene Set Networks Based on Biclusters Obtained by Maximal Frequent Itemset Mining
    Takahashi, Kei-ichiro
    Takigawa, Ichigaku
    Mamitsuka, Hiroshi
    PLOS ONE, 2013, 8 (12):
  • [27] Cluster analysis of PM2.5 pollution in China using the frequent itemset clustering approach
    Zhang, Liankui
    Yang, Guangfei
    ENVIRONMENTAL RESEARCH, 2022, 204
  • [28] A Novel Modified Apriori Approach for Web Document Clustering
    Roul, Rajendra Kumar
    Varshneya, Saransh
    Kalra, Ashu
    Sahay, Sanjay Kumar
    COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 3, 2015, 33
  • [29] Phrase Based Web Document Clustering: An Indexing Approach
    Singh, Amit Prakash
    Srivastava, Shalini
    Sahu, Sanjib Kumar
    COMPUTER COMMUNICATION, NETWORKING AND INTERNET SECURITY, 2017, 5 : 481 - 492
  • [30] MMCDM Based Approach for Efficient Web Document Clustering in Web Search
    Siva, R.
    Thandapani, T.
    Ramesh, R.
    Balamurali, R.
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (03): : 82 - 88