Summable and nonsummable data-driven models for community detection in feature-rich networks

被引：8

作者：

Shalileh, Soroosh ^{[1
,2
]}

Mirkin, Boris ^{[1
,3
]}

机构：

[1] HSE Univ, Lab Methods Big Data Anal, Pokrovsky Blvd 11, Moscow, Russia

[2] HSE Univ, Lab Methods Big Data Anal, Pokrovsky Blvd 11, Moscow, Russia

[3] Birkbeck Univ London, Dept Comp Sci & Informat Syst, London WC1E 7HX, England

来源：

SOCIAL NETWORK ANALYSIS AND MINING | 2021年 / 11卷 / 01期

关键词：

Attributed network; Feature-rich network; Community detection; Sequential extraction; Least squares data recovery; One-by-one clustering; K-MEANS; ALGORITHM;

D O I：

10.1007/s13278-021-00774-8

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A feature-rich network is a network whose nodes are characterized by categorical or quantitative features. We propose a data-driven model for finding a partition of the nodes to approximate both the network link data and the feature data. The model involves summary quantitative characteristics of both network links and features. We distinguish between two modes of using the network link data. One mode postulates that the link values are comparable and summable across the network (summability); the other assumption models the case in which different nodes represent different measurement systems so that the link data are neither comparable, nor summable, across different nodes (nonsummability). We derive a Pythagorean decomposition of the combined data scatter involving our data recovery least-squares criterion. We address an equivalent problem of maximizing its complementary part, the contribution of a found partition to the combined data scatter. We follow a doubly greedy strategy in maximizing that. First, communities are found one-by-one, and second, entities are added one-by-one in the process of identifying a community. Our algorithms determine the number of clusters automatically. The nonsummability version proves to have a niche of its own; also, it is faster than the other version. In our experiments, they appear to be competitive over generated synthetic data sets and six real-world data sets from the literature.

引用

页数：23

共 50 条

[1] Summable and nonsummable data-driven models for community detection in feature-rich networks
Soroosh Shalileh
Boris Mirkin
Social Network Analysis and Mining, 2021, 11
[2] A Data Recovery Method for Community Detection in Feature-Rich Networks
Shalileh, Soroosh
Mirkin, Boris
2020 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2020, : 99 - 104
[3] Community Detection in Feature-Rich Networks Using Data Recovery Approach
Boris Mirkin
Soroosh Shalileh
Journal of Classification, 2022, 39 : 432 - 462
[4] Community Detection in Feature-Rich Networks Using Data Recovery Approach
Mirkin, Boris
Shalileh, Soroosh
JOURNAL OF CLASSIFICATION, 2022, 39 (03) : 432 - 462
[5] Community Detection in Feature-Rich Networks Using Gradient Descent Approach
Shalileh, Soroosh
Mirkin, Boris
COMPLEX NETWORKS & THEIR APPLICATIONS XII, VOL 2, COMPLEX NETWORKS 2023, 2024, 1142 : 185 - 196
[6] Community Detection in Feature-Rich Networks to Meet K-means
Shalileh, Soroosh
Mirkin, Boris
PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2021, 2021, : 138 - 142
[7] Community detection over feature-rich information networks: An eHealth case study
Moscato, Vincenzo
Sperli, Giancarlo
INFORMATION SYSTEMS, 2022, 109
[8] Least-squares community extraction in feature-rich networks using similarity data
Shalileh, Soroosh
Mirkin, Boris
PLOS ONE, 2021, 16 (07):
[9] Towards hypergraph cognitive networks as feature-rich models of knowledge
Citraro, Salvatore
De Deyne, Simon
Stella, Massimo
Rossetti, Giulio
EPJ DATA SCIENCE, 2023, 12 (01)
[10] Towards hypergraph cognitive networks as feature-rich models of knowledge
Salvatore Citraro
Simon De Deyne
Massimo Stella
Giulio Rossetti
EPJ Data Science, 12

← 1 2 3 4 5 →