Summable and nonsummable data-driven models for community detection in feature-rich networks

被引:8
|
作者
Shalileh, Soroosh [1 ,2 ]
Mirkin, Boris [1 ,3 ]
机构
[1] HSE Univ, Lab Methods Big Data Anal, Pokrovsky Blvd 11, Moscow, Russia
[2] HSE Univ, Lab Methods Big Data Anal, Pokrovsky Blvd 11, Moscow, Russia
[3] Birkbeck Univ London, Dept Comp Sci & Informat Syst, London WC1E 7HX, England
关键词
Attributed network; Feature-rich network; Community detection; Sequential extraction; Least squares data recovery; One-by-one clustering; K-MEANS; ALGORITHM;
D O I
10.1007/s13278-021-00774-8
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A feature-rich network is a network whose nodes are characterized by categorical or quantitative features. We propose a data-driven model for finding a partition of the nodes to approximate both the network link data and the feature data. The model involves summary quantitative characteristics of both network links and features. We distinguish between two modes of using the network link data. One mode postulates that the link values are comparable and summable across the network (summability); the other assumption models the case in which different nodes represent different measurement systems so that the link data are neither comparable, nor summable, across different nodes (nonsummability). We derive a Pythagorean decomposition of the combined data scatter involving our data recovery least-squares criterion. We address an equivalent problem of maximizing its complementary part, the contribution of a found partition to the combined data scatter. We follow a doubly greedy strategy in maximizing that. First, communities are found one-by-one, and second, entities are added one-by-one in the process of identifying a community. Our algorithms determine the number of clusters automatically. The nonsummability version proves to have a niche of its own; also, it is faster than the other version. In our experiments, they appear to be competitive over generated synthetic data sets and six real-world data sets from the literature.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Summable and nonsummable data-driven models for community detection in feature-rich networks
    Soroosh Shalileh
    Boris Mirkin
    Social Network Analysis and Mining, 2021, 11
  • [2] A Data Recovery Method for Community Detection in Feature-Rich Networks
    Shalileh, Soroosh
    Mirkin, Boris
    2020 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2020, : 99 - 104
  • [3] Community Detection in Feature-Rich Networks Using Data Recovery Approach
    Boris Mirkin
    Soroosh Shalileh
    Journal of Classification, 2022, 39 : 432 - 462
  • [4] Community Detection in Feature-Rich Networks Using Data Recovery Approach
    Mirkin, Boris
    Shalileh, Soroosh
    JOURNAL OF CLASSIFICATION, 2022, 39 (03) : 432 - 462
  • [5] Community Detection in Feature-Rich Networks Using Gradient Descent Approach
    Shalileh, Soroosh
    Mirkin, Boris
    COMPLEX NETWORKS & THEIR APPLICATIONS XII, VOL 2, COMPLEX NETWORKS 2023, 2024, 1142 : 185 - 196
  • [6] Community Detection in Feature-Rich Networks to Meet K-means
    Shalileh, Soroosh
    Mirkin, Boris
    PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2021, 2021, : 138 - 142
  • [7] Community detection over feature-rich information networks: An eHealth case study
    Moscato, Vincenzo
    Sperli, Giancarlo
    INFORMATION SYSTEMS, 2022, 109
  • [8] Least-squares community extraction in feature-rich networks using similarity data
    Shalileh, Soroosh
    Mirkin, Boris
    PLOS ONE, 2021, 16 (07):
  • [9] Towards hypergraph cognitive networks as feature-rich models of knowledge
    Citraro, Salvatore
    De Deyne, Simon
    Stella, Massimo
    Rossetti, Giulio
    EPJ DATA SCIENCE, 2023, 12 (01)
  • [10] Towards hypergraph cognitive networks as feature-rich models of knowledge
    Salvatore Citraro
    Simon De Deyne
    Massimo Stella
    Giulio Rossetti
    EPJ Data Science, 12