Imbalanced Node Classification With Synthetic Over-Sampling

被引:0
|
作者
Zhao, Tianxiang [1 ]
Zhang, Xiang [1 ]
Wang, Suhang [1 ]
机构
[1] Penn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
基金
美国国家科学基金会;
关键词
Task analysis; Training; Interpolation; Chatbots; Classification algorithms; Topology; Training data; Node classification; imbalanced learning; graph; data augmentation; graph neural network; SHORTEST-PATH; ROAD NETWORKS; QUERIES; DECOMPOSITION; MAINTENANCE; INDEX;
D O I
10.1109/TKDE.2024.3443160
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, graph neural networks (GNNs) have achieved state-of-the-art performance for node classification. However, most existing GNNs would suffer from the graph imbalance problem. In many real-world scenarios, node classes are imbalanced, with some majority classes making up most parts of the graph. The message propagation mechanism in GNNs would further amplify the dominance of those majority classes, resulting in sub-optimal classification performance. In this work, we seek to address this problem by generating pseudo instances of minority classes to balance the training data, extending previous over-sampling-based techniques. This task is non-trivial, as those techniques are designed with the assumption that instances are independent. Neglection of relation information would complicate this oversampling process. Furthermore, the node classification task typically takes the semi-supervised setting with only a few labeled nodes, providing insufficient supervision for the generation of minority instances. Generated new nodes of low quality would harm the trained classifier. In this work, we address these difficulties by synthesizing new nodes in a constructed embedding space, which encodes both node attributes and topology information. Furthermore, an edge generator is trained simultaneously to model the graph structure and provide relations for new samples. To further improve the data efficiency, we also explore synthesizing mixed "in-between" nodes to utilize nodes from the majority class in this over-sampling process. Experiments on real-world datasets validate the effectiveness of our proposed framework.
引用
收藏
页码:8515 / 8528
页数:14
相关论文
共 50 条
  • [21] An overlapping minimization-based over-sampling algorithm for binary imbalanced classification
    Lu, Xuan
    Ye, Xuan
    Cheng, Yingchao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [22] An Over-sampling Method Based on Probability Density Estimation for Imbalanced Datasets Classification
    Cao, Lu
    Zhai, Yi-Kui
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION PROCESSING (ICIIP'16), 2016,
  • [23] Adaptive over-sampling method for classification with application to imbalanced datasets in aluminum electrolysis
    Huang, Zhaoke
    Yang, Chunhua
    Chen, Xiaofang
    Huang, Keke
    Xie, Yongfang
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (11): : 7183 - 7199
  • [24] A Normal Distribution-Based Over-Sampling Approach to Imbalanced Data Classification
    Zhang, Huaxiang
    Wang, Zhichao
    ADVANCED DATA MINING AND APPLICATIONS, PT I, 2011, 7120 : 83 - 96
  • [25] Adaptive over-sampling method for classification with application to imbalanced datasets in aluminum electrolysis
    Zhaoke Huang
    Chunhua Yang
    Xiaofang Chen
    Keke Huang
    Yongfang Xie
    Neural Computing and Applications, 2020, 32 : 7183 - 7199
  • [26] Multilabel Over-sampling and Under-sampling with Class Alignment for Imbalanced Multilabel Text Classification
    Taha, Adil Yaseen
    Tiun, Sabrina
    Abd Rahman, Abdul Hadi
    Sabah, Ali
    JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2021, 20 (03): : 423 - 456
  • [27] Handling Autism Imbalanced Data using Synthetic Minority Over-Sampling Technique (SMOTE)
    El-Sayed, Asmaa Ahmed
    Meguid, Nagwa Abdel
    Mahmood, Mahmood Abdel Manem
    Hefny, Hesham Ahmed
    PROCEEDINGS OF 2015 THIRD IEEE WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2015,
  • [28] SROT: Sparse representation-based over-sampling technique for classification of imbalanced dataset
    Zou, Xionggao
    Feng, Yueping
    Li, Huiying
    Jiang, Shuyu
    2ND INTERNATIONAL CONFERENCE ON MATERIALS SCIENCE, ENERGY TECHNOLOGY AND ENVIRONMENTAL ENGINEERING (MSETEE 2017), 2017, 81
  • [29] KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling
    Hao Ding
    Bin Wei
    Zhaorui Gu
    Zhibin Yu
    Haiyong Zheng
    Bing Zheng
    Juan Li
    Multimedia Tools and Applications, 2020, 79 : 14871 - 14888
  • [30] Deep Over-sampling Framework for Classifying Imbalanced Data
    Ando, Shin
    Huang, Chun Yuan
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2017, PT I, 2017, 10534 : 770 - 785