Semi-supervised classification and betweenness computation on large, sparse, directed graphs

被引:28
|
作者
Mantrach, Amin [1 ]
van Zeebroeck, Nicolas [1 ]
Francq, Pascal [2 ]
Shimbo, Masashi [3 ]
Bersini, Hugues [1 ]
Saerens, Marco [2 ]
机构
[1] Univ Libre Bruxelles, IRIDIA Lab, B-1050 Brussels, Belgium
[2] Univ Louvain, ISYS LSM & Machine Learning Grp, B-1348 Louvain, Belgium
[3] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara, Japan
关键词
Graph mining; Semi-supervised classification; Within-network classification; Betweenness centrality; Graph-based classification; Kernel methods; Kernel on a graph; Large-scale graphs; RANDOM-WALK; SPILLOVERS; RESTART;
D O I
10.1016/j.patcog.2010.11.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work addresses graph-based semi-supervised classification and betweenness computation in large, sparse, networks (several millions of nodes). The objective of semi-supervised classification is to assign a label to unlabeled nodes using the whole topology of the graph and the labeling at our disposal. Two approaches are developed to avoid explicit computation of pairwise proximity between the nodes of the graph, which would be impractical for graphs containing millions of nodes. The first approach directly computes, for each class, the sum of the similarities between the nodes to classify and the labeled nodes of the class, as suggested initially in [1,2]. Along this approach, two algorithms exploiting different state-of-the-art kernels on a graph are developed. The same strategy can also be used in order to compute a betweenness measure. The second approach works on a trellis structure built from biased random walks on the graph, extending an idea introduced in [3]. These random walks allow to define a biased bounded betweenness for the nodes of interest, defined separately for each class. All the proposed algorithms have a linear computing time in the number of edges while providing good results, and hence are applicable to large sparse networks. They are empirically validated on medium-size standard data sets and are shown to be competitive with state-of-the-art techniques. Finally, we processed a novel data set, which is made available for benchmarking, for multi-class classification in a large network: the U.S. patents citation network containing 3M nodes (of six different classes) and 38M edges. The three proposed algorithms achieve competitive results (around 85% classification rate) on this large network-they classify the unlabeled nodes within a few minutes on a standard workstation. (C) 2010 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1212 / 1224
页数:13
相关论文
共 50 条
  • [1] Sparse regularization for semi-supervised classification
    Fan, Mingyu
    Gu, Nannan
    Qiao, Hong
    Zhang, Bo
    [J]. PATTERN RECOGNITION, 2011, 44 (08) : 1777 - 1784
  • [2] Sparse graphs with smoothness constraints: Application to dimensionality reduction and semi-supervised classification
    Dornaika, E.
    Weng, L.
    [J]. PATTERN RECOGNITION, 2019, 95 : 285 - 295
  • [3] SNCA: Semi-Supervised Node Classification for Evolving Large Attributed Graphs
    Abbasi, Faima
    Muzammal, Muhammad
    Qu, Qiang
    Riaz, Farhan
    Ashraf, Jawad
    [J]. BIG DATA MINING AND ANALYTICS, 2024, 7 (03): : 794 - 808
  • [4] Statistical mechanics of semi-supervised clustering in sparse graphs
    Steeg, Greg Ver
    Galstyan, Aram
    Allahverdyan, Armen E.
    [J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2011,
  • [5] Combining smooth graphs with semi-supervised classification
    Zhou, Xueyuan
    Li, Chunping
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 400 - 409
  • [6] Semi-supervised classification based on smooth graphs
    Zhou, Xueyuan
    Li, Chunping
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2006, 3882 : 757 - 766
  • [7] Semi-supervised Hyperspectral Image Classification with Graphs
    Bandos, Tatyana V.
    Zhou, Dengyong
    Camps-Valls, Gustavo
    [J]. 2006 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOLS 1-8, 2006, : 3883 - +
  • [8] Regularization and semi-supervised learning on large graphs
    Belkin, M
    Matveeva, I
    Niyogi, P
    [J]. LEARNING THEORY, PROCEEDINGS, 2004, 3120 : 624 - 638
  • [9] Semi-supervised classification based on subspace sparse representation
    Guoxian Yu
    Guoji Zhang
    Zili Zhang
    Zhiwen Yu
    Lin Deng
    [J]. Knowledge and Information Systems, 2015, 43 : 81 - 101
  • [10] Semi-supervised classification based on subspace sparse representation
    Yu, Guoxian
    Zhang, Guoji
    Zhang, Zili
    Yu, Zhiwen
    Deng, Lin
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 43 (01) : 81 - 101