Clustering large attributed information networks: an efficient incremental computing approach

被引:36
|
作者
Cheng, Hong [1 ]
Zhou, Yang [2 ]
Huang, Xin [1 ]
Yu, Jeffrey Xu [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
[2] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
关键词
Graph clustering; Incremental computation; Parallel computing; RANDOM-WALK; RESTART;
D O I
10.1007/s10618-012-0263-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, many information networks have become available for analysis, including social networks, road networks, sensor networks, biological networks, etc. Graph clustering has shown its effectiveness in analyzing and visualizing large networks. The goal of graph clustering is to partition vertices in a large graph into clusters based on various criteria such as vertex connectivity or neighborhood similarity. Many existing graph clustering methods mainly focus on the topological structures, but largely ignore the vertex properties which are often heterogeneous. Recently, a new graph clustering algorithm, SA-cluster, has been proposed which combines structural and attribute similarities through a unified distance measure. SA-Cluster performs matrix multiplication to calculate the random walk distances between graph vertices. As part of the clustering refinement, the graph edge weights are iteratively adjusted to balance the relative importance between structural and attribute similarities. As a consequence, matrix multiplication is repeated in each iteration of the clustering process to recalculate the random walk distances which are affected by the edge weight update. In order to improve the efficiency and scalability of SA-cluster, in this paper, we propose an efficient algorithm In-Cluster to incrementally update the random walk distances given the edge weight increments. Complexity analysis is provided to estimate how much runtime cost Inc-Cluster can save. We further design parallel matrix computation techniques on a multicore architecture. Experimental results demonstrate that Inc-Cluster achieves significant speedup over SA-Cluster on large graphs, while achieving exactly the same clustering quality in terms of intra-cluster structural cohesiveness and attribute value homogeneity.
引用
收藏
页码:450 / 477
页数:28
相关论文
共 50 条
  • [41] An Energy Efficient Clustering and Routing approach for Wireless Sensor Networks
    Muthuselvi, M.
    AD HOC & SENSOR WIRELESS NETWORKS, 2022, 54 (3-4) : 169 - 192
  • [42] INCREMENTAL CLUSTERING FOR DYNAMIC INFORMATION-PROCESSING
    CAN, F
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1993, 11 (02) : 143 - 164
  • [43] Efficient incremental subspace clustering in data streams
    Kontaki, Maria
    Papadopoulos, Apostolos N.
    Manolopoulos, Yannis
    10TH INTERNATIONAL DATABASE ENGINEERING AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2006, : 53 - 60
  • [44] Energy Efficient Approach for Clustering Algorithm in Wireless Sensor Networks
    Muzammel, M.
    Shabbir, G.
    Rafique, U.
    17TH IEEE INTERNATIONAL MULTI TOPIC CONFERENCE 2014, 2014, : 46 - 51
  • [45] An Efficient Greedy Incremental Sequence Clustering Algorithm
    Ju, Zhen
    Zhang, Huiling
    Meng, Jingtao
    Zhang, Jingjing
    Li, Xuelei
    Fan, Jianping
    Pan, Yi
    Liu, Weiguo
    Wei, Yanjie
    BIOINFORMATICS RESEARCH AND APPLICATIONS, ISBRA 2021, 2021, 13064 : 596 - 607
  • [46] A distributed efficient clustering approach for ad hoc and sensor networks
    Li, JH
    Yu, M
    Levy, R
    MOBILE AD-HOC AND SENSOR NETWORKS, PROCEEDINGS, 2005, 3794 : 937 - 949
  • [47] AN ENERGY-EFFICIENT CLUSTERING APPROACH FOR WIRELESS SENSOR NETWORKS
    Chuang, Po-Jen
    Yang, Sheng-Hsiung
    Lin, Chih-Shin
    JOURNAL OF THE CHINESE INSTITUTE OF ENGINEERS, 2009, 32 (07) : 951 - 957
  • [48] Incremental Collaborative Clustering sing Information Theory and Information Compression
    Sublime, Jeremie
    FUZZY SYSTEMS AND DATA MINING V (FSDM 2019), 2019, 320 : 457 - 464
  • [49] An incremental document clustering for the large document database
    Joo, KH
    Lee, WS
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2005, 3689 : 374 - 387
  • [50] Incremental Pairwise Clustering for Large Proximity Matrices
    Seo, Sambu
    Mohr, Johannes
    Li, Ningfei
    Horn, Andreas
    Obermayer, Klaus
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,