PROBABILISTIC HEURISTICS FOR HIERARCHICAL WEB DATA CLUSTERING

被引:2
|
作者
Chehreghani, Morteza Haghir [1 ]
Chehreghani, Mostafa Haghir [1 ]
Abolhassani, Hassan [1 ]
机构
[1] Sharif Univ Technol, Fac Comp Engn, Web Intelligence Lab, Dept Comp Engn, Tehran, Iran
关键词
data mining; Web clustering; Bayesian networks; hierarchical clustering; representative point;
D O I
10.1111/j.1467-8640.2012.00414.x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering Web data is one important technique for extracting knowledge from the Web. In this paper, a novel method is presented to facilitate the clustering. The method determines the appropriate number of clusters and provides suitable representatives for each cluster by inference from a Bayesian network. Furthermore, by means of the Bayesian network, the contents of the Web pages are converted into vectors of lower dimensions. The method is also extended for hierarchical clustering, and a useful heuristic is developed to select a good hierarchy. The experimental results show that the clusters produced benefit from high quality.
引用
收藏
页码:209 / 233
页数:25
相关论文
共 50 条
  • [31] A probabilistic hierarchical clustering method for organising collections of text documents
    Vinokourov, A
    Girolami, M
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS, 2000, : 182 - 185
  • [32] Hierarchical Clustering of Shotgun Proteomics Data
    Koskinen, Ville R.
    Emery, Patrick A.
    Creasy, David M.
    Cottrell, John S.
    MOLECULAR & CELLULAR PROTEOMICS, 2011, 10 (06)
  • [33] Ordering of categorical data in hierarchical clustering
    Kazimianec, Michail
    DATABASES AND INFORMATION SYSTEMS, 2008, : 401 - 404
  • [34] A robust hierarchical clustering for georeferenced data
    D'Urso, Pierpaolo
    Vitale, Vincenzina
    SPATIAL STATISTICS, 2020, 35
  • [35] Semantic hierarchical, online clustering of Web search results
    Zhang, D
    Don, YS
    ADVANCED WEB TECHNOLOGIES AND APPLICATIONS, 2004, 3007 : 69 - 78
  • [36] MDL hierarchical clustering with incomplete data
    Lai, Po-Hsiang
    O'Sullivan, Joseph A.
    2010 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA), 2010, : 369 - 373
  • [37] Hierarchical clustering of gene expression data
    Luo, F
    Tang, K
    Khan, L
    THIRD IEEE SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING - BIBE 2003, PROCEEDINGS, 2003, : 328 - 335
  • [38] Agglomerative hierarchical clustering for data with tolerance
    Yasunori, Endo
    Yukihiro, Hamasuna
    Sadaaki, Miyamoto
    GRC: 2007 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, PROCEEDINGS, 2007, : 404 - 409
  • [39] Hierarchical Clustering of Spatial Urban Data
    Cesario, Eugenio
    Vinci, Andrea
    Zhu, Xiaotian
    NUMERICAL COMPUTATIONS: THEORY AND ALGORITHMS, PT I, 2020, 11973 : 223 - 231
  • [40] Hierarchical clustering for functional dissimilarity data
    Mizuta, M
    7TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL V, PROCEEDINGS: COMPUTER SCIENCE AND ENGINEERING: I, 2003, : 223 - 227