A maximal frequent itemset approach for web document clustering

被引:0
|
作者
Zhuang, L [1 ]
Dai, HH [1 ]
机构
[1] Deakin Univ, Sch Informat Technol, Burwood, Vic 3125, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To efficiently and yet accurately cluster web documents is of great interests to web users and is a key component Of the searching accuracy of a web search engine. To achieve this, this paper introduces a new approach for the clustering of web documents, which is called Maximal Frequent Item-set(MFI) approach. Iterative clustering algorithms, such as K-means and Expectation-Maximization(EM), are sensitive to their initial conditions. MFI approach firstly locates the center points of high density clusters precisely. These center points then are used as initial points for the K-means algorithm. Our experimental results tested on 3 web document sets show that our MFI approach outperforms the other methods we compared in most cases, particularly in the case of large number of categories in web document sets.
引用
收藏
页码:970 / 977
页数:8
相关论文
共 50 条
  • [1] An efficient and resilience linear prefix approach for mining maximal frequent itemset using clustering
    Sinthuja, M.
    Pravinthraja, S.
    Dhanalakshmi, B. K.
    Gururaj, H. L.
    Ravi, Vinayakumar
    Lal, G. Jyothish
    JOURNAL OF SAFETY SCIENCE AND RESILIENCE, 2025, 6 (01): : 93 - 104
  • [2] A maximal frequent itemset algorithm
    Wang, H
    Li, QH
    Ma, CX
    Li, KL
    ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, 2003, 2639 : 484 - 490
  • [3] Document clustering based on maximal frequent sequences
    Hernandez-Reyes, Edith
    Garcia-Hernandez, Rene A.
    Carrasco-Ochoa, J. A.
    Martinez-Trinidad, J. Fco.
    ADVANCES IN NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4139 : 257 - 267
  • [4] Frequent Itemset Mining for Clustering Near Duplicate Web Documents
    Ignatov, Dmitry I.
    Kuznetsov, Sergei O.
    CONCEPTUAL STRUCTURES: LEVERAGING SEMANTIC TECHNOLOGIES, PROCEEDINGS, 2009, 5662 : 185 - 200
  • [5] A Frequent and Rare Itemset Mining Approach to Transaction Clustering
    Tummala, Kuladeep
    Oswald, C.
    Sivaselvan, B.
    DATA SCIENCE ANALYTICS AND APPLICATIONS, DASAA 2017, 2018, 804 : 8 - 18
  • [6] MAFIA: A maximal frequent itemset algorithm
    Burdick, D
    Calimlim, M
    Flannick, J
    Gehrke, J
    Yiu, TM
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (11) : 1490 - 1504
  • [7] FICWAN Frequent Itemset Clustering of Web Articles by Analyzing the Article Neighborhood
    Kucecka, Tomas
    Chuda, Daniela
    Sladecek, Peter
    14TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS (CINTI), 2013, : 509 - 514
  • [8] Frequent Itemset Based Hierarchical Document Clustering Using Wikipedia as External Knowledge
    Kiran, G. V. R.
    Shankar, Ravi
    Pudi, Vikram
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT II, 2010, 6277 : 11 - 20
  • [9] The Discussions of Maximal Frequent Itemset Mining Optimization
    Li, Haifeng
    2016 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE AND INTERNET TECHNOLOGY (CII 2016), 2016, : 96 - 100
  • [10] An Active Learning Approach to Frequent Itemset-Based Text Clustering
    Marcacini, Ricardo M.
    Correa, Geraldo N.
    Rezende, Solange O.
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 3529 - 3532