Clustering large attributed information networks: an efficient incremental computing approach

被引:36
|
作者
Cheng, Hong [1 ]
Zhou, Yang [2 ]
Huang, Xin [1 ]
Yu, Jeffrey Xu [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
关键词
Graph clustering; Incremental computation; Parallel computing; RANDOM-WALK; RESTART;
D O I
10.1007/s10618-012-0263-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, many information networks have become available for analysis, including social networks, road networks, sensor networks, biological networks, etc. Graph clustering has shown its effectiveness in analyzing and visualizing large networks. The goal of graph clustering is to partition vertices in a large graph into clusters based on various criteria such as vertex connectivity or neighborhood similarity. Many existing graph clustering methods mainly focus on the topological structures, but largely ignore the vertex properties which are often heterogeneous. Recently, a new graph clustering algorithm, SA-cluster, has been proposed which combines structural and attribute similarities through a unified distance measure. SA-Cluster performs matrix multiplication to calculate the random walk distances between graph vertices. As part of the clustering refinement, the graph edge weights are iteratively adjusted to balance the relative importance between structural and attribute similarities. As a consequence, matrix multiplication is repeated in each iteration of the clustering process to recalculate the random walk distances which are affected by the edge weight update. In order to improve the efficiency and scalability of SA-cluster, in this paper, we propose an efficient algorithm In-Cluster to incrementally update the random walk distances given the edge weight increments. Complexity analysis is provided to estimate how much runtime cost Inc-Cluster can save. We further design parallel matrix computation techniques on a multicore architecture. Experimental results demonstrate that Inc-Cluster achieves significant speedup over SA-Cluster on large graphs, while achieving exactly the same clustering quality in terms of intra-cluster structural cohesiveness and attribute value homogeneity.
引用
收藏
页码:450 / 477
页数:28
相关论文
共 50 条
  • [1] Clustering large attributed information networks: an efficient incremental computing approach
    Hong Cheng
    Yang Zhou
    Xin Huang
    Jeffrey Xu Yu
    Data Mining and Knowledge Discovery, 2012, 25 : 450 - 477
  • [2] INCREMENTAL CLUSTERING OF ATTRIBUTED GRAPHS
    SEONG, DS
    KIM, HS
    PARK, KH
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1993, 23 (05): : 1399 - 1411
  • [3] Efficient Clustering Approach using Incremental and Hierarchical Clustering Methods
    Srinivas, M.
    Mohan, C. Krishna
    2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [4] Semi-supervised Clustering in Attributed Heterogeneous Information Networks
    Li, Xiang
    Wu, Yao
    Ester, Martin
    Kao, Ben
    Wang, Xin
    Zheng, Yudian
    PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'17), 2017, : 1621 - 1629
  • [6] A clustering approach to incremental learning for feedforward neural networks
    Engelbrecht, AP
    Brits, R
    IJCNN'01: INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2001, : 2019 - 2024
  • [7] SCHAIN-IRAM: An Efficient and Effective Semi-Supervised Clustering Algorithm for Attributed Heterogeneous Information Networks
    Li, Xiang
    Wu, Yao
    Ester, Martin
    Kao, Ben
    Wang, Xin
    Zheng, Yudian
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (04) : 1980 - 1992
  • [8] Interpretable Probabilistic Divisive Clustering of Large Node-Attributed Networks
    Kaati, Lisa
    Ruul, Adam
    2017 EUROPEAN INTELLIGENCE AND SECURITY INFORMATICS CONFERENCE (EISIC), 2017, : 68 - 75
  • [9] Semi-supervised Co-Clustering on Attributed Heterogeneous Information Networks
    Ji, Yugang
    Shi, Chuan
    Fang, Yuan
    Kong, Xiangnan
    Yin, Mingyang
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (06)
  • [10] An efficient method for attributed graph clustering
    Wu, Ye
    Zhong, Zhi-Nong
    Xiong, Wei
    Chen, Luo
    Jing, Ning
    Jisuanji Xuebao/Chinese Journal of Computers, 2013, 36 (08): : 1704 - 1713