A probabilistic relational approach for web document clustering

被引:9
|
作者
Fersini, E. [1 ]
Messina, E. [1 ]
Archetti, F. [1 ,2 ]
机构
[1] Univ Milano Bicocca, Dipartimento Informat Sistemist & Comunicaz, Milan, Italy
[2] Consorzio Milano Ric, I-20126 Milan, Italy
关键词
Relational document clustering; Relational web structure estimation;
D O I
10.1016/j.ipm.2009.08.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The exponential growth of information available on the World Wide Web, and retrievable by search engines, has implied the necessity to develop efficient and effective methods for organizing relevant contents. In this field document clustering plays an important role and remains an interesting and challenging problem in the field of web computing. In this paper we present a document clustering method, which takes into account both contents information and hyperlink structure of web page collection, where a document is viewed as a set of semantic units. We exploit this representation to determine the strength of a relation between two linked pages and to define a relational clustering algorithm based on a probabilistic graph representation. The experimental results show that the proposed approach, called RED-clustering, outperforms two of the most well known clustering algorithm as k-Means and Expectation Maximization. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:117 / 130
页数:14
相关论文
共 50 条
  • [21] An Evolutionary Approach for Document Clustering
    Akter, Ruksana
    Chung, Yoojin
    [J]. 2013 INTERNATIONAL CONFERENCE ON ELECTRONIC ENGINEERING AND COMPUTER SCIENCE (EECS 2013), 2013, 4 : 370 - 375
  • [22] WEB PAGE CLASSIFICATION THROUGH PROBABILISTIC RELATIONAL MODELS
    Fersini, Elisabetta
    Messina, Enza
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2013, 27 (04)
  • [23] Probabilistic Relational Models with Relational Uncertainty: An Early Study in Web Page Classification
    Fersini, E.
    Messina, E.
    Archetti, F.
    [J]. 2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 3, 2009, : 139 - 142
  • [24] Web search result refinement by document clustering
    Tsui, Ming Hei
    Lim, Bresley
    Shi, Daming
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 2224 - 2229
  • [25] PROBABILISTIC VALIDATION APPROACH FOR CLUSTERING
    HAREVEN, M
    BRAILOVSKY, VL
    [J]. PATTERN RECOGNITION LETTERS, 1995, 16 (11) : 1189 - 1196
  • [26] Web document clustering using hyperlink structures
    He, X
    Zha, HY
    Ding, CHQ
    Simon, HD
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2002, 41 (01) : 19 - 45
  • [27] Digital Web Library of a Website with Document Clustering
    Mahecha-Nieto, Isabel
    Leon, Elizabeth
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2010, 2010, 6433 : 214 - 223
  • [28] Mining a Web citation database for document clustering
    He, Y
    Hui, SC
    Fong, ACM
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2002, 16 (04) : 283 - 302
  • [29] Unsupervised clustering for nontextual web document classification
    Chan, SWK
    Chong, MWC
    [J]. DECISION SUPPORT SYSTEMS, 2004, 37 (03) : 377 - 396
  • [30] A Feature Selection for Korean Web Document Clustering
    Park, Heum
    Kim, Young-Gi
    Kwon, Hyuk-Chul
    [J]. IECON 2004: 30TH ANNUAL CONFERENCE OF IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOL 3, 2004, : 2650 - 2654