Link-Based Clustering Algorithm for Clustering Web Documents

被引:2
|
作者
Ashokkumar, P. [1 ]
Don, S. [2 ]
机构
[1] VIT, Sch Comp Sci & Engn, Near Katpadi Rd, Vellore 632014, Tamil Nadu, India
[2] VIT, Sch Comp Sci & Engn, TIFAC CORE Automot Infotron, Near Katpadi Rd, Vellore 632014, Tamil Nadu, India
关键词
clustering; web documents clustering; web link clustering;
D O I
10.1520/JTE20180497
中图分类号
TB3 [工程材料学];
学科分类号
0805 ; 080502 ;
摘要
Clustering web documents involves the use of a large amount of words to be inputted to clustering algorithms such as K-Means, Cosine Similarity, Latent Discelet Allocation, and so on. This causes the clustering process to consume much time as the number of words in each document increases. In many web documents, web links are available along with the contents; these web link texts may contain a tremendous amount of information for clustering. In our work, we show that just using the web link text alone gives better clustering efficiency than considering the whole document text. We implemented our algorithm with two benchmark datasets, and the results show that the clustering efficiency is increased by our algorithm more than the existing methods.
引用
收藏
页码:4096 / 4107
页数:12
相关论文
共 50 条
  • [1] Density link-based methods for clustering web pages
    Chehreghani, Morteza Haghir
    Abolhassani, Hassan
    Chehreghani, Mostafa Haghir
    [J]. DECISION SUPPORT SYSTEMS, 2009, 47 (04) : 374 - 382
  • [2] Use link-based clustering to improve Web search results
    Wang, YT
    Kitsuregawa, M
    [J]. SECOND INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, VOL I, PROCEEDINGS, 2002, : 115 - 124
  • [3] Link-based multi-verse optimizer for text documents clustering
    Abasi, Ammar Kamal
    Khader, Ahamad Tajudin
    Al-Betar, Mohammed Azmi
    Naim, Syibrah
    Makhadmeh, Sharif Naser
    Alyasseri, Zaid Abdi Alkareem
    [J]. APPLIED SOFT COMPUTING, 2020, 87
  • [4] Link-based similarity measures for the classification of Web documents
    Calado, P
    Cristo, M
    Gonçalves, MA
    de Moura, ES
    Ribeiro-Neto, B
    Ziviani, N
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (02): : 208 - 221
  • [5] A term-based algorithm for hierarchical clustering of web documents
    Schenker, A
    Last, M
    Kandel, A
    [J]. JOINT 9TH IFSA WORLD CONGRESS AND 20TH NAFIPS INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS. 1-5, 2001, : 3076 - 3081
  • [6] A Neighborhood Search Method for Link-Based Tag Clustering
    Cui, Jianwei
    Li, Pei
    Liu, Hongyan
    He, Jun
    Du, Xiaoyong
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 91 - +
  • [7] Correlation Clustering Based on Genetic Algorithm for Documents Clustering
    Zhang, Zhenya
    Cheng, Hongmei
    Chen, Wanli
    Zhang, Shuguang
    Fang, Qiansheng
    [J]. 2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 3193 - +
  • [8] Clustering Aggregation Based on Genetic Algorithm for Documents Clustering
    Zhang, Zhenya
    Cheng, Hongmei
    Zhang, Shuguang
    Chen, Wanli
    Fang, Qiansheng
    [J]. 2008 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-8, 2008, : 3156 - +
  • [9] Semantic based clustering of web documents
    Lin, TY
    Chiang, IJ
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2005, : 189 - 192
  • [10] Clustering template based web documents
    Gottron, Thomas
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 40 - 51