A maximal frequent itemset approach for web document clustering

被引：0

作者：

Zhuang, L ^{[1
]}

Dai, HH ^{[1
]}

机构：

[1] Deakin Univ, Sch Informat Technol, Burwood, Vic 3125, Australia

来源：

FOURTH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, PROCEEDINGS | 2004年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

To efficiently and yet accurately cluster web documents is of great interests to web users and is a key component Of the searching accuracy of a web search engine. To achieve this, this paper introduces a new approach for the clustering of web documents, which is called Maximal Frequent Item-set(MFI) approach. Iterative clustering algorithms, such as K-means and Expectation-Maximization(EM), are sensitive to their initial conditions. MFI approach firstly locates the center points of high density clusters precisely. These center points then are used as initial points for the K-means algorithm. Our experimental results tested on 3 web document sets show that our MFI approach outperforms the other methods we compared in most cases, particularly in the case of large number of categories in web document sets.

引用

页码：970 / 977

页数：8

共 50 条

[21] Fuzzy Maximal Frequent Itemset Mining Over Quantitative Databases
Li, Haifeng
Wang, Yue
Zhang, Ning
Zhang, Yuejin
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2017, PT I, 2017, 10191 : 476 - 486
[22] An algorithm for mining constrained maximal frequent itemset in uncertain data
Du, Haizhou
Journal of Information and Computational Science, 2012, 9 (15): : 4509 - 4515
[23] A Theoretical Comparison of Two Maximal Frequent Itemset Mining Algorithms
Li, Haifeng
PROCEEDINGS OF THE 2016 5TH INTERNATIONAL CONFERENCE ON ADVANCED MATERIALS AND COMPUTER SCIENCE, 2016, 80 : 363 - 366
[24] Frequent Term Based Text Document Clustering: A New Approach
Kumar, Manoj
Yadav, D. K.
Gupta, Vijay Kumar
2015 INTERNATIONAL CONFERENCE ON SOFT COMPUTING TECHNIQUES AND IMPLEMENTATIONS (ICSCTI), 2015,
[25] Maximal Frequent Sequences for Document Classification
Hai Nguyen Thi Tuyet
Tan Hanh
PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR COMMUNICATIONS (ATC), 2016, : 152 - 157
[26] SiBIC: A Web Server for Generating Gene Set Networks Based on Biclusters Obtained by Maximal Frequent Itemset Mining
Takahashi, Kei-ichiro
Takigawa, Ichigaku
Mamitsuka, Hiroshi
PLOS ONE, 2013, 8 (12):
[27] Cluster analysis of PM2.5 pollution in China using the frequent itemset clustering approach
Zhang, Liankui
Yang, Guangfei
ENVIRONMENTAL RESEARCH, 2022, 204
[28] A Novel Modified Apriori Approach for Web Document Clustering
Roul, Rajendra Kumar
Varshneya, Saransh
Kalra, Ashu
Sahay, Sanjay Kumar
COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 3, 2015, 33
[29] Phrase Based Web Document Clustering: An Indexing Approach
Singh, Amit Prakash
Srivastava, Shalini
Sahu, Sanjib Kumar
COMPUTER COMMUNICATION, NETWORKING AND INTERNET SECURITY, 2017, 5 : 481 - 492
[30] MMCDM Based Approach for Efficient Web Document Clustering in Web Search
Siva, R.
Thandapani, T.
Ramesh, R.
Balamurali, R.
BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (03): : 82 - 88

← 1 2 3 4 5 →