A probabilistic relational approach for web document clustering

被引:9
|
作者
Fersini, E. [1 ]
Messina, E. [1 ]
Archetti, F. [1 ,2 ]
机构
[1] Univ Milano Bicocca, Dipartimento Informat Sistemist & Comunicaz, Milan, Italy
[2] Consorzio Milano Ric, I-20126 Milan, Italy
关键词
Relational document clustering; Relational web structure estimation;
D O I
10.1016/j.ipm.2009.08.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The exponential growth of information available on the World Wide Web, and retrievable by search engines, has implied the necessity to develop efficient and effective methods for organizing relevant contents. In this field document clustering plays an important role and remains an interesting and challenging problem in the field of web computing. In this paper we present a document clustering method, which takes into account both contents information and hyperlink structure of web page collection, where a document is viewed as a set of semantic units. We exploit this representation to determine the strength of a relation between two linked pages and to define a relational clustering algorithm based on a probabilistic graph representation. The experimental results show that the proposed approach, called RED-clustering, outperforms two of the most well known clustering algorithm as k-Means and Expectation Maximization. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:117 / 130
页数:14
相关论文
共 50 条
  • [1] Web Service Clustering Using Relational Database Approach
    Liu, Jianxiao
    Liu, Feng
    Li, Xiaoxia
    He, Keqing
    Ma, Yutao
    Wang, Jian
    [J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2015, 25 (08) : 1365 - 1393
  • [2] A Probabilistic Framework for Relational Clustering
    Long, Bo
    Zhang, Zhongfei
    Yu, Philip S.
    [J]. KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 470 - 479
  • [3] A maximal frequent itemset approach for web document clustering
    Zhuang, L
    Dai, HH
    [J]. FOURTH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, PROCEEDINGS, 2004, : 970 - 977
  • [4] A Novel Modified Apriori Approach for Web Document Clustering
    Roul, Rajendra Kumar
    Varshneya, Saransh
    Kalra, Ashu
    Sahay, Sanjay Kumar
    [J]. COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 3, 2015, 33
  • [5] Phrase Based Web Document Clustering: An Indexing Approach
    Singh, Amit Prakash
    Srivastava, Shalini
    Sahu, Sanjib Kumar
    [J]. COMPUTER COMMUNICATION, NETWORKING AND INTERNET SECURITY, 2017, 5 : 481 - 492
  • [6] MMCDM Based Approach for Efficient Web Document Clustering in Web Search
    Siva, R.
    Thandapani, T.
    Ramesh, R.
    Balamurali, R.
    [J]. BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (03): : 82 - 88
  • [7] Web Document Clustering Approach using WordNet Lexical Categories and Fuzzy Clustering
    Gharib, Tarek F.
    Fouad, Mohammed M.
    Aref, Mostafa M.
    [J]. 2008 11TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY: ICCIT 2008, VOLS 1 AND 2, 2008, : 55 - +
  • [8] Web mining with relational clustering
    Runkler, TA
    Bezdek, JC
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2003, 32 (2-3) : 217 - 236
  • [9] Probabilistic Relational Models with Clustering Uncertainty
    Coutant, Anthony
    Leray, Philippe
    Le Capitaine, Hoel
    [J]. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [10] Leveraging Probabilistic Segmentation to Document Clustering
    Banerjee, Arko
    [J]. 2015 EIGHTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2015, : 82 - 87