Semi-Supervised Heterogeneous Graph Learning with Multi-Level Data Augmentation

被引:0
|
作者
Chen, Ying [1 ]
Qiang, Siwei [1 ]
Ha, Mingming [1 ,2 ]
Liu, Xiaolei [1 ]
Li, Shaoshuai [1 ]
Tong, Jiabi [1 ]
Yuan, Lingfeng [1 ]
Guo, Xiaobo [1 ,3 ]
Zhu, Zhenfeng [3 ]
机构
[1] Ant Grp, MYbank, Hangzhou, Zhejiang, Peoples R China
[2] Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing, Peoples R China
[3] Beijing Jiaotong Univ, Inst Informat Sci, Beijing, Peoples R China
关键词
Semi-supervised learning; node augmentation; triangle augmentation;
D O I
10.1145/3608953
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, semi-supervised graph learning with data augmentation (DA) has been the most commonly used and best-performing method to improve model robustness in sparse scenarios with few labeled samples. However, most existing DA methods are based on the homogeneous graph, but none are specific for the heterogeneous graph. Differing from the homogeneous graph, DA in the heterogeneous graph faces greater challenges: heterogeneity of information requires DA strategies to effectively handle heterogeneous relations, which considers the information contribution of different types of neighbors and edges to the target nodes. Furthermore, over-squashing of information is caused by the negative curvature formed by the nonuniformity distribution and the strong clustering in a complex graph. To address these challenges, this article presents a novel method named HG-MDA (Semi-Supervised Heterogeneous Graph Learning with Multi-Level Data Augmentation). For the problem of heterogeneity of information in DA, node and topology augmentation strategies are proposed for the characteristics of the heterogeneous graph. Additionally, meta-relation-based attention is applied as one of the indexes for selecting augmented nodes and edges. For the problem of over-squashing of information, triangle-based edge adding and removing are designed to alleviate the negative curvature and bring the gain of topology. Finally, the loss function consists of the cross-entropy loss for labeled data and the consistency regularization for unlabeled data. To effectively fuse the prediction results of various DA strategies, sharpening is used. Existing experiments on public datasets (i.e., ACM, DBLP, and OGB) and the industry dataset MB show that HG-MDA outperforms current SOTA models. Additionally, HG-MDA is applied to user identification in internet finance scenarios, helping the business to add 30% key users, and increase loans and balances by 3.6%, 11.1%, and 9.8%.
引用
收藏
页数:27
相关论文
共 50 条
  • [21] Graph Construction for Semi-Supervised Learning
    Berton, Lilian
    Lopes, Alneu de Andrade
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 4343 - 4344
  • [22] Efficiently Learning the Graph for Semi-supervised Learning
    Sharma, Dravyansh
    Jones, Maxwell
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 1900 - 1910
  • [23] Flexible data representation with graph convolution for semi-supervised learning
    Fadi Dornaika
    Neural Computing and Applications, 2021, 33 : 6851 - 6863
  • [24] Graph-Based Semi-Supervised Learning on Evolutionary Data
    Song, Yanglei
    Yang, Yifei
    Dou, Weibei
    Zhang, Changshui
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: BIG DATA AND MACHINE LEARNING TECHNIQUES, ISCIDE 2015, PT II, 2015, 9243 : 467 - 476
  • [25] Flexible data representation with graph convolution for semi-supervised learning
    Dornaika, Fadi
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (12): : 6851 - 6863
  • [26] Semi-supervised learning of heterogeneous data in remote sensing imagery
    Benedetto, J.
    Czaja, W.
    Dobrosotskaya, J.
    Doster, T.
    Duke, K.
    Gillis, D.
    INDEPENDENT COMPONENT ANALYSES, COMPRESSIVE SAMPLING, WAVELETS, NEURAL NET, BIOSYSTEMS, AND NANOENGINEERING X, 2012, 8401
  • [27] Multi-level Augmentation Boosts Hybrid CNN-Transformer Model for Semi-supervised Cardiac MRI Segmentation
    Lin, Ruohan
    Qi, Wangjing
    Wang, Tao
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT I, 2024, 14447 : 552 - 563
  • [28] ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning
    Olsson, Viktor
    Tranheden, Wilhelm
    Pinto, Juliano
    Svensson, Lennart
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1368 - 1377
  • [29] Semi-Supervised Learning with Data Augmentation for End-to-End ASR
    Weninger, Felix
    Mana, Franco
    Gemello, Roberto
    Andres-Ferrer, Jesus
    Zhan, Puming
    INTERSPEECH 2020, 2020, : 2802 - 2806
  • [30] Heterogeneous graph convolutional network for multi-view semi-supervised classification
    Wang, Shiping
    Huang, Sujia
    Wu, Zhihao
    Liu, Rui
    Chen, Yong
    Zhang, Dell
    NEURAL NETWORKS, 2024, 178