Link-Based Clustering Algorithm for Clustering Web Documents

被引：2

作者：

Ashokkumar, P. ^{[1
]}

Don, S. ^{[2
]}

机构：

[1] VIT, Sch Comp Sci & Engn, Near Katpadi Rd, Vellore 632014, Tamil Nadu, India

[2] VIT, Sch Comp Sci & Engn, TIFAC CORE Automot Infotron, Near Katpadi Rd, Vellore 632014, Tamil Nadu, India

来源：

JOURNAL OF TESTING AND EVALUATION | 2019年 / 47卷 / 06期

关键词：

clustering; web documents clustering; web link clustering;

D O I：

10.1520/JTE20180497

中图分类号：

TB3 [工程材料学];

学科分类号：

0805 ; 080502 ;

摘要：

Clustering web documents involves the use of a large amount of words to be inputted to clustering algorithms such as K-Means, Cosine Similarity, Latent Discelet Allocation, and so on. This causes the clustering process to consume much time as the number of words in each document increases. In many web documents, web links are available along with the contents; these web link texts may contain a tremendous amount of information for clustering. In our work, we show that just using the web link text alone gives better clustering efficiency than considering the whole document text. We implemented our algorithm with two benchmark datasets, and the results show that the clustering efficiency is increased by our algorithm more than the existing methods.

引用

页码：4096 / 4107

页数：12

共 50 条

[1] Density link-based methods for clustering web pages
Chehreghani, Morteza Haghir
Abolhassani, Hassan
Chehreghani, Mostafa Haghir
[J]. DECISION SUPPORT SYSTEMS, 2009, 47 (04) : 374 - 382
[2] Use link-based clustering to improve Web search results
Wang, YT
Kitsuregawa, M
[J]. SECOND INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, VOL I, PROCEEDINGS, 2002, : 115 - 124
[3] Link-based multi-verse optimizer for text documents clustering
Abasi, Ammar Kamal
Khader, Ahamad Tajudin
Al-Betar, Mohammed Azmi
Naim, Syibrah
Makhadmeh, Sharif Naser
Alyasseri, Zaid Abdi Alkareem
[J]. APPLIED SOFT COMPUTING, 2020, 87
[4] Link-based similarity measures for the classification of Web documents
Calado, P
Cristo, M
Gonçalves, MA
de Moura, ES
Ribeiro-Neto, B
Ziviani, N
[J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (02): : 208 - 221
[5] A term-based algorithm for hierarchical clustering of web documents
Schenker, A
Last, M
Kandel, A
[J]. JOINT 9TH IFSA WORLD CONGRESS AND 20TH NAFIPS INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS. 1-5, 2001, : 3076 - 3081
[6] A Neighborhood Search Method for Link-Based Tag Clustering
Cui, Jianwei
Li, Pei
Liu, Hongyan
He, Jun
Du, Xiaoyong
[J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 91 - +
[7] Correlation Clustering Based on Genetic Algorithm for Documents Clustering
Zhang, Zhenya
Cheng, Hongmei
Chen, Wanli
Zhang, Shuguang
Fang, Qiansheng
[J]. 2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 3193 - +
[8] Clustering Aggregation Based on Genetic Algorithm for Documents Clustering
Zhang, Zhenya
Cheng, Hongmei
Zhang, Shuguang
Chen, Wanli
Fang, Qiansheng
[J]. 2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 3156 - +
[9] Semantic based clustering of web documents
Lin, TY
Chiang, IJ
[J]. 2005 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2005, : 189 - 192
[10] Clustering template based web documents
Gottron, Thomas
[J]. ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 40 - 51

← 1 2 3 4 5 →