Attributed Graph Clustering Approach Based on Dynamic Cluster Formation Game

被引:0
|
作者
Bu Z. [1 ]
Wang Y.-Y. [2 ]
Ma L.-N. [3 ]
Jiang J.-C. [2 ]
Cao J. [2 ]
机构
[1] Jiangsu Provincial Key Laboratory of E-Business, Nanjing University of Finance and Economics, Nanjing
[2] School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing
[3] WinGin Business-Intelligence Academy Nanjing Co., Ltd, Nanjing
来源
基金
中国国家自然科学基金;
关键词
Attributed graph clustering; Autonomy-oriented computing; Dynamic cluster formation game; Locally Pareto optimality; Multi-objective optimization;
D O I
10.11897/SP.J.1016.2021.01824
中图分类号
学科分类号
摘要
Except for rich node attribute information, there is the complex topological information in some modern online social networks, such as Sina Weibo and WeChat. Such types of social network can usually be represented as an attributed graph. Traditional graph clustering approaches are often based on an assumption that the node attributes and network topology share a same cluster membership. However, it does not always hold in many real-world social networks. Take Sina Weibo as an example, analyzing the follow lists of Weibo users through community detection techniques can directly obtain which users gather into a social group, while these users may produce diverse user-generated content, reflecting differentiated preference characteristics. How to effectively integrate attributive and topological information for clustering attributed graphs becomes a new challenge, which is also critical for understanding, analyzing as well as visualizing large-scale social networks. In this paper, we formulated the target problem as a multi-objective optimization problem, and proposed a dynamic cluster formation game based attributed graph clustering approach. First, we defined a new centrality index, called the influence of nodes, to measure the node influence and designed an effective heuristic method to initialize the cluster centroids of attribute graphs. Second, based on the dynamic game theory, a greedy local search strategy was proposed to update the cluster labels of nodes, and we strictly proved that such local search strategy can make the cluster structure converge to the local Pareto optimality. Third, an autonomy-oriented computing based attributed graph clustering algorithm was proposed, which does not need to specify the cluster number and its running time scales linearly with the total number of edges. Furthermore, we tested and evaluated the proposed approach's performance from three aspects. First, we performed a separate convergence analysis for the proposed approach on the Google+attributed social network. We tested the convergence of four objective functions (i.e., K-means loss function, Havrada-Charvat generation entropy, negative modularity and negative compactness) that need to be optimized in the approach under three different Bregman divergence settings (i.e., Euclide distance squared, KL divergence distance and cosine distance). The results show that four objective functions can converge after 50 iterations. Then, we compared the proposed approach with 9 baseline methods in terms of accuracy and scalability on 4 large-scale attributed social networks. Experimental results of clustering accuracy showed that the proposed approach is at least 0.7% higher than other algorithms with best performance under NMI metric, and is at least 0.2% higher than most algorithms with best performance under AvgF1 metric. In addition, in terms of the test of scalability, the proposed approach can obtain final results within 1 hour even on the largest Google+attributed social network. Finally, we performed a visualization analysis on a small PolBK network. The results showed that the proposed approach reached a stable state after 14 rounds of iteration, and the uncovered cluster structure was close to the ground-truth. Overall, extensive experiments shows that the proposed approach can accurately detect the hidden cluster structure in real-world attributed graphs. Compared with the state-of-the-art approaches of clustering nodes in attributed graphs, our approach has better effectiveness and efficiency. © 2021, Science Press. All right reserved.
引用
收藏
页码:1824 / 1840
页数:16
相关论文
共 41 条
  • [1] Zhou Y, Cheng H, Yu J X., Graph clustering based on structural/attribute similarities, Proceedings of the VLDB Endowment, 2, 1, pp. 718-729, (2009)
  • [2] Bu Z, Li H, Zhang C, Et al., Graph K-means based on leader identification, dynamic game, and opinion dynamics, IEEE Transactions on Knowledge and Data Engineering, 32, 7, pp. 1384-1361, (2020)
  • [3] Cheng H, Zhou Y, Yu J., Clustering large attributed graphs: A balance between structural and attribute similarities, ACM Transactions on Knowledge Discovery from Data, 5, 2, pp. 190-205, (2011)
  • [4] Xu Z, Ke Y, Wang Y, Et al., A model-based approach to attributed graph clustering, Proceedings of the ACM International Conference on Management of Data, pp. 505-516, (2012)
  • [5] Yang J, Mcauley J, Leskovec J., Community detection in networks with node attributes, Proceedings of the IEEE International Conference on Data Mining, pp. 1151-1156, (2013)
  • [6] Folino F, Pizzuti C., An evolutionary multiobjective approach for community discovery in dynamic networks, IEEE Transactions on Knowledge and Data Engineering, 26, 8, pp. 1838-1852, (2014)
  • [7] Li Z T, Liu J, Wu K., A multiobjective evolutionary algorithm based on structural and attribute similarities for community detection in attributed networks, IEEE Transactions on Cybernetics, 48, 7, pp. 1963-1976, (2018)
  • [8] MacQueen J., Some methods for classification and analysis of multivariate observations, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297, (1967)
  • [9] Frey B, Dueck D., Clustering by passing messages between data points, Science, 315, 5814, pp. 972-976, (2007)
  • [10] Ester M, Kriegel H P, Sander J, Xu X W., A density-based algorithm for discovering clusters in large spatial databases with noise, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery&Data Mining, pp. 226-231, (1996)