Imbalanced Node Classification With Synthetic Over-Sampling

被引:0
|
作者
Zhao, Tianxiang [1 ]
Zhang, Xiang [1 ]
Wang, Suhang [1 ]
机构
[1] Penn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
基金
美国国家科学基金会;
关键词
Task analysis; Training; Interpolation; Chatbots; Classification algorithms; Topology; Training data; Node classification; imbalanced learning; graph; data augmentation; graph neural network; SHORTEST-PATH; ROAD NETWORKS; QUERIES; DECOMPOSITION; MAINTENANCE; INDEX;
D O I
10.1109/TKDE.2024.3443160
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, graph neural networks (GNNs) have achieved state-of-the-art performance for node classification. However, most existing GNNs would suffer from the graph imbalance problem. In many real-world scenarios, node classes are imbalanced, with some majority classes making up most parts of the graph. The message propagation mechanism in GNNs would further amplify the dominance of those majority classes, resulting in sub-optimal classification performance. In this work, we seek to address this problem by generating pseudo instances of minority classes to balance the training data, extending previous over-sampling-based techniques. This task is non-trivial, as those techniques are designed with the assumption that instances are independent. Neglection of relation information would complicate this oversampling process. Furthermore, the node classification task typically takes the semi-supervised setting with only a few labeled nodes, providing insufficient supervision for the generation of minority instances. Generated new nodes of low quality would harm the trained classifier. In this work, we address these difficulties by synthesizing new nodes in a constructed embedding space, which encodes both node attributes and topology information. Furthermore, an edge generator is trained simultaneously to model the graph structure and provide relations for new samples. To further improve the data efficiency, we also explore synthesizing mixed "in-between" nodes to utilize nodes from the majority class in this over-sampling process. Experiments on real-world datasets validate the effectiveness of our proposed framework.
引用
收藏
页码:8515 / 8528
页数:14
相关论文
共 50 条
  • [41] Cluster-Based Minority Over-Sampling for Imbalanced Datasets
    Puntumapon, Kamthorn
    Rakthamamon, Thanawin
    Waiyamai, Kitsana
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (12): : 3101 - 3109
  • [42] Imbalanced Data Over-Sampling Method Based on ISODATA Clustering
    Lv, Zhenzhe
    Liu, Qicheng
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (09) : 1528 - 1536
  • [43] BCGAN-based Over-sampling Scheme for Imbalanced Data
    Son, Minjae
    Jung, Seungwon
    Moon, Jihoon
    Hwang, Eenjun
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 155 - 160
  • [44] Deep convolutional neural networks with genetic algorithm-based synthetic minority over-sampling technique for improved imbalanced data classification
    Alex, Suja A.
    Nayahi, J. Jesu Vedha
    Kaddoura, Sanaa
    APPLIED SOFT COMPUTING, 2024, 156
  • [45] Affine combination-based over-sampling for imbalanced regression
    Li, Zhen-Zhen
    Huang, Niu
    Yi, Lun-Zhao
    Fu, Guang-Hui
    JOURNAL OF CHEMOMETRICS, 2024, 38 (03)
  • [46] A sparrow search algorithm-optimized convolutional neural network for imbalanced data classification using synthetic minority over-sampling technique
    Deng, Wu
    He, Qi
    Zhou, Xiangbing
    Chen, Huayue
    Zhao, Huimin
    PHYSICA SCRIPTA, 2023, 98 (11)
  • [47] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    2002, American Association for Artificial Intelligence (16):
  • [48] An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data
    Hao, Ming
    Wang, Yanli
    Bryant, Stephen H.
    ANALYTICA CHIMICA ACTA, 2014, 806 : 117 - 127
  • [49] Synthetic Minority Over-Sampling Technique based on Fuzzy C-means Clustering for Imbalanced Data
    Lee, Hansoo
    Jung, Seunghyan
    Kim, Minseok
    Kimt, Sungshin
    2017 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY), 2017,
  • [50] Preprocessing of Imbalanced Breast Cancer Data using Feature Selection Combined with Over-Sampling Technique for classification
    Jojan, Janjira
    Srivihok, Anongnart
    2013 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2013, : 407 - 412