A maximal frequent itemset approach for web document clustering

被引：0

作者：

Zhuang, L ^{[1
]}

Dai, HH ^{[1
]}

机构：

[1] Deakin Univ, Sch Informat Technol, Burwood, Vic 3125, Australia

来源：

FOURTH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, PROCEEDINGS | 2004年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

To efficiently and yet accurately cluster web documents is of great interests to web users and is a key component Of the searching accuracy of a web search engine. To achieve this, this paper introduces a new approach for the clustering of web documents, which is called Maximal Frequent Item-set(MFI) approach. Iterative clustering algorithms, such as K-means and Expectation-Maximization(EM), are sensitive to their initial conditions. MFI approach firstly locates the center points of high density clusters precisely. These center points then are used as initial points for the K-means algorithm. Our experimental results tested on 3 web document sets show that our MFI approach outperforms the other methods we compared in most cases, particularly in the case of large number of categories in web document sets.

引用

页码：970 / 977

页数：8

共 50 条

[1] An efficient and resilience linear prefix approach for mining maximal frequent itemset using clustering
Sinthuja, M.
Pravinthraja, S.
Dhanalakshmi, B. K.
Gururaj, H. L.
Ravi, Vinayakumar
Lal, G. Jyothish
JOURNAL OF SAFETY SCIENCE AND RESILIENCE, 2025, 6 (01): : 93 - 104
[2] A maximal frequent itemset algorithm
Wang, H
Li, QH
Ma, CX
Li, KL
ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, 2003, 2639 : 484 - 490
[3] Document clustering based on maximal frequent sequences
Hernandez-Reyes, Edith
Garcia-Hernandez, Rene A.
Carrasco-Ochoa, J. A.
Martinez-Trinidad, J. Fco.
ADVANCES IN NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4139 : 257 - 267
[4] Frequent Itemset Mining for Clustering Near Duplicate Web Documents
Ignatov, Dmitry I.
Kuznetsov, Sergei O.
CONCEPTUAL STRUCTURES: LEVERAGING SEMANTIC TECHNOLOGIES, PROCEEDINGS, 2009, 5662 : 185 - 200
[5] A Frequent and Rare Itemset Mining Approach to Transaction Clustering
Tummala, Kuladeep
Oswald, C.
Sivaselvan, B.
DATA SCIENCE ANALYTICS AND APPLICATIONS, DASAA 2017, 2018, 804 : 8 - 18
[6] MAFIA: A maximal frequent itemset algorithm
Burdick, D
Calimlim, M
Flannick, J
Gehrke, J
Yiu, TM
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (11) : 1490 - 1504
[7] FICWAN Frequent Itemset Clustering of Web Articles by Analyzing the Article Neighborhood
Kucecka, Tomas
Chuda, Daniela
Sladecek, Peter
14TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS (CINTI), 2013, : 509 - 514
[8] Frequent Itemset Based Hierarchical Document Clustering Using Wikipedia as External Knowledge
Kiran, G. V. R.
Shankar, Ravi
Pudi, Vikram
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT II, 2010, 6277 : 11 - 20
[9] The Discussions of Maximal Frequent Itemset Mining Optimization
Li, Haifeng
2016 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING, INFORMATION SCIENCE AND INTERNET TECHNOLOGY (CII 2016), 2016, : 96 - 100
[10] An Active Learning Approach to Frequent Itemset-Based Text Clustering
Marcacini, Ricardo M.
Correa, Geraldo N.
Rezende, Solange O.
2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 3529 - 3532

← 1 2 3 4 5 →