GMMDA: Gaussian mixture modeling of graph in latent space for graph data augmentation

被引:0
|
作者
Li, Yanjin [1 ]
Xu, Linchuan [1 ]
Yamanishi, Kenji [1 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Math Informat, 7-3-1 Hongo,Bunkyo Ku, Tokyo 1138656, Japan
基金
日本科学技术振兴机构;
关键词
Graph data augmentation; Graph neural networks; Gaussian mixture model; Semi-supervised learning; Minimum description length principle;
D O I
10.1007/s10115-024-02207-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Graph data augmentation (GDA), which manipulates graph structure and/or attributes, has been demonstrated as an effective method for improving the generalization of graph neural networks on semi-supervised node classification. As a data augmentation technique, label preservation is critical, that is, node labels should not change after data manipulation. However, most existing methods overlook the label preservation requirements. Determining the label-preserving nature of a GDA method is highly challenging, owing to the non-Euclidean nature of the graph structure. In this study, for the first time, we formulate a label-preserving problem (LPP) in the context of GDA. The LPP is formulated as an optimization problem in which, given a fixed augmentation budget, the objective is to find an augmented graph with minimal difference in data distribution compared to the original graph. To solve the LPP problem, we propose GMMDA, a generative data augmentation (DA) method based on Gaussian mixture modeling (GMM) of a graph in a latent space. We designed a novel learning objective that jointly learns a low-dimensional graph representation and estimates the GMM. The learning is followed by sampling from the GMM, and the samples are converted back to the graph as additional nodes. To uphold label preservation, we designed a minimum description length (MDL)-based method to select a set of samples that produces the minimum shift in the data distribution captured by the GMM. Through experiments, we demonstrate that GMMDA can improve the performance of graph convolutional network on Cora, Citeseer and Pubmed by as much as 7.75%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$7.75\%$$\end{document}, 8.75%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$8.75\%$$\end{document} and 5.87%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5.87\%$$\end{document}, respectively, significantly outperforming the state-of-the-art methods.
引用
收藏
页数:29
相关论文
共 50 条
  • [21] Graph signal interpolation and extrapolation over manifold of Gaussian mixture
    Zach, Itay
    Dvorkind, Tsvi G.
    Talmon, Ronen
    [J]. SIGNAL PROCESSING, 2024, 216
  • [22] Modeling and Storage of XML Data as a Graph and Processing with Graph Processor
    Sanal, A.
    Suganthi, G.
    [J]. 2017 2ND WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT), 2017, : 16 - 19
  • [23] GAUSSIAN DISTRIBUTED GRAPH CONSTRAINED MULTI-MODAL GAUSSIAN PROCESS LATENT VARIABLE MODEL FOR ORDINAL LABELED DATA
    Maeda, Keisuke
    Ogawa, Takahiro
    Haseyama, Miki
    [J]. Proceedings - International Conference on Image Processing, ICIP, 2022, : 3798 - 3802
  • [24] Deep Clustering by Gaussian Mixture Variational Autoencoders with Graph Embedding
    Yang, Linxiao
    Cheung, Ngai-Man
    Li, Jiaying
    Fang, Jun
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6449 - 6458
  • [25] GAUSSIAN DISTRIBUTED GRAPH CONSTRAINED MULTI-MODAL GAUSSIAN PROCESS LATENT VARIABLE MODEL FOR ORDINAL LABELED DATA
    Maeda, Keisuke
    Ogawa, Takahiro
    Haseyama, Miki
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3798 - 3802
  • [26] Large Margin Gaussian Mixture Classifier With a Gabriel Graph Geometric Representation of Data Set Structure
    Torres, Luiz C. B.
    Castro, Cristiano L.
    Coelho, Frederico
    Braga, Antonio P.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (03) : 1400 - 1406
  • [27] Latent mixture modeling for clustered data
    Sugasawa, Shonosuke
    Kobayashi, Genya
    Kawakubo, Yuki
    [J]. STATISTICS AND COMPUTING, 2019, 29 (03) : 537 - 548
  • [28] Latent mixture modeling for clustered data
    Shonosuke Sugasawa
    Genya Kobayashi
    Yuki Kawakubo
    [J]. Statistics and Computing, 2019, 29 : 537 - 548
  • [29] AutoGDA: Automated Graph Data Augmentation for Node Classification
    Zhao, Tong
    Tang, Xianfeng
    Zhang, Danqing
    Jiang, Haoming
    Rao, Nikhil
    Song, Yiwei
    Agrawal, Pallav
    Subbian, Karthik
    Yin, Bing
    Jiang, Meng
    [J]. LEARNING ON GRAPHS CONFERENCE, VOL 198, 2022, 198
  • [30] Latent Gaussian Processes Based Graph Learning for Urban Traffic Prediction
    Wang, Xu
    Wang, Pengkun
    Wang, Binwu
    Zhang, Yudong
    Zhou, Zhengyang
    Bai, Lei
    Wang, Yang
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2024, 73 (01) : 282 - 294