A unified probabilistic framework for clustering correlated heterogeneous web objects

被引:0
|
作者
Liu, GW [1 ]
Zhu, WB [1 ]
Yu, Y [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200030, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Most existing algorithms cluster highly correlated data objects (e.g. web pages and web queries) separately. Some other algorithms, however, do take into account the relationship between data objects, but they either integrate content and link features into a unified feature space or apply a hard clustering algorithm, making it difficult to fully utilize the correlated information over the heterogeneous Web objects. In this paper, we propose a novel unified probabilistic framework for iteratively clustering correlated heterogeneous data objects until it converges. Our approach introduces two latent clustering layers, which serve as two mixture probabilistic models of the features. In each clustering iteration we use EM (Expectation-Maximization) algorithm to estimate the parameters of the mixture model in one latent layer and propagate them to the other one. The experimental results show that our approach effectively combines the content and link features and improves the performance of the clustering.
引用
收藏
页码:76 / 87
页数:12
相关论文
共 50 条
  • [21] Effective and Efficient Clustering Methods for Correlated Probabilistic Graphs
    Gu, Yu
    Gao, Chunpeng
    Cong, Gao
    Yu, Ge
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (05) : 1117 - 1130
  • [22] A comprehensive framework for modeling heterogeneous objects
    Conde-Rodriguez, Francisco
    Torres, Juan-Carlos
    Garcia-Fernandez, Angel-Luis
    Feito-Higueruela, Francisco-Ramon
    VISUAL COMPUTER, 2017, 33 (01): : 17 - 31
  • [23] A comprehensive framework for modeling heterogeneous objects
    Francisco Conde-Rodríguez
    Juan-Carlos Torres
    Ángel-Luis García-Fernández
    Francisco-Ramón Feito-Higueruela
    The Visual Computer, 2017, 33 : 17 - 31
  • [24] A probabilistic relational approach for web document clustering
    Fersini, E.
    Messina, E.
    Archetti, F.
    INFORMATION PROCESSING & MANAGEMENT, 2010, 46 (02) : 117 - 130
  • [25] PROBABILISTIC HEURISTICS FOR HIERARCHICAL WEB DATA CLUSTERING
    Chehreghani, Morteza Haghir
    Chehreghani, Mostafa Haghir
    Abolhassani, Hassan
    COMPUTATIONAL INTELLIGENCE, 2012, 28 (02) : 209 - 233
  • [26] Web Objects Clustering Using Transaction Log
    Jia Rongfei
    Jin Maozhong
    Wang Xiaobo
    THIRD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING: WKDD 2010, PROCEEDINGS, 2010, : 182 - 186
  • [27] A framework for consistent, replicated Web objects
    Kermarrec, AM
    Kuz, I
    van Steen, M
    Tanenbaum, AS
    18TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, PROCEEDINGS, 1998, : 276 - 284
  • [28] A Unified Framework for Spectral Clustering in Sparse Graphs
    Dall'Amico, Lorenzo
    Couillet, Romain
    Tremblay, Nicolas
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [29] A UNIFIED FRAMEWORK FOR TUNING HYPERPARAMETERS IN CLUSTERING PROBLEMS
    Fan, Xinjie
    Wang, Y. X. Rachel
    Sarkar, Purnamrita
    Yue, Yuguang
    STATISTICA SINICA, 2024, 34 (02) : 933 - 954
  • [30] A unified framework for model-based clustering
    Zhong, S
    Ghosh, J
    JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (06) : 1001 - 1037