GMMDA: Gaussian mixture modeling of graph in latent space for graph data augmentation

被引:0
|
作者
Li, Yanjin [1 ]
Xu, Linchuan [1 ]
Yamanishi, Kenji [1 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Math Informat, 7-3-1 Hongo,Bunkyo Ku, Tokyo 1138656, Japan
基金
日本科学技术振兴机构;
关键词
Graph data augmentation; Graph neural networks; Gaussian mixture model; Semi-supervised learning; Minimum description length principle;
D O I
10.1007/s10115-024-02207-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Graph data augmentation (GDA), which manipulates graph structure and/or attributes, has been demonstrated as an effective method for improving the generalization of graph neural networks on semi-supervised node classification. As a data augmentation technique, label preservation is critical, that is, node labels should not change after data manipulation. However, most existing methods overlook the label preservation requirements. Determining the label-preserving nature of a GDA method is highly challenging, owing to the non-Euclidean nature of the graph structure. In this study, for the first time, we formulate a label-preserving problem (LPP) in the context of GDA. The LPP is formulated as an optimization problem in which, given a fixed augmentation budget, the objective is to find an augmented graph with minimal difference in data distribution compared to the original graph. To solve the LPP problem, we propose GMMDA, a generative data augmentation (DA) method based on Gaussian mixture modeling (GMM) of a graph in a latent space. We designed a novel learning objective that jointly learns a low-dimensional graph representation and estimates the GMM. The learning is followed by sampling from the GMM, and the samples are converted back to the graph as additional nodes. To uphold label preservation, we designed a minimum description length (MDL)-based method to select a set of samples that produces the minimum shift in the data distribution captured by the GMM. Through experiments, we demonstrate that GMMDA can improve the performance of graph convolutional network on Cora, Citeseer and Pubmed by as much as 7.75%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$7.75\%$$\end{document}, 8.75%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$8.75\%$$\end{document} and 5.87%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$5.87\%$$\end{document}, respectively, significantly outperforming the state-of-the-art methods.
引用
收藏
页数:29
相关论文
共 50 条
  • [31] On the effectiveness of graph data augmentation for source code learning
    Dong, Zeming
    Hu, Qiang
    Zhang, Zhenya
    Zhao, Jianjun
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 285
  • [32] Graph contrastive learning for recommendation with generative data augmentation
    Li, Xiaoge
    Wang, Yin
    Wang, Yihan
    An, Xiaochun
    [J]. MULTIMEDIA SYSTEMS, 2024, 30 (04)
  • [33] Generalized heterophily graph data augmentation for node classification
    Tang, Bisheng
    Chen, Xiaojun
    Wang, Shaopu
    Xuan, Yuexin
    Zhao, Zhendong
    [J]. NEURAL NETWORKS, 2023, 168 : 339 - 349
  • [34] Towards fidelity of graph data augmentation via equivariance
    Zhang, Bai
    Gao, Yixing
    Ji, Feng
    Xie, Linbo
    Cao, Xiaofeng
    Shan, Yixiang
    Yang, Jielong
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 280
  • [35] SStackGNN: Graph Data Augmentation Simplified Stacking Graph Neural Network for Twitter Bot Detection
    Shi, Shuhao
    Chen, Jian
    Wang, Zhengyan
    Zhang, Yuxin
    Zhang, Yongmao
    Fu, Chengqi
    Qiao, Kai
    Yan, Bin
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
  • [36] Modeling Trajectory Data as a Directed Graph
    Korkmaz, Ali
    Elik, Ferdi
    Aydin, Furkan
    Bulut, Mertcan
    Kul, Seda
    Sayar, Ahmet
    [J]. MINING INTELLIGENCE AND KNOWLEDGE EXPLORATION, MIKE 2018, 2018, 11308 : 168 - 176
  • [37] Latent association graph inference for binary transaction data
    Reynolds, David
    Carvalho, Luis
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 160
  • [38] Latent Space Bayesian Optimization With Latent Data Augmentation for Enhanced Exploration
    Boyar, Onur
    Takeuchi, Ichiro
    [J]. Neural Computation, 2024, 36 (11) : 2446 - 2478
  • [39] Multimodal Earth observation data fusion: Graph-based approach in shared latent space
    Arun, P., V
    Sadeh, R.
    Avneri, A.
    Tubul, Y.
    Camino, C.
    Buddhiraju, K. M.
    Porwal, A.
    Lati, R. N.
    Zarco-Tejada, P. J.
    Peleg, Z.
    Herrmann, I
    [J]. INFORMATION FUSION, 2022, 78 : 20 - 39
  • [40] Graph representation learning based on deep generative gaussian mixture models
    Niknam, Ghazaleh
    Molaei, Soheila
    Zare, Hadi
    Clifton, David
    Pan, Shirui
    [J]. NEUROCOMPUTING, 2023, 523 : 157 - 169