Imbalanced Node Classification With Synthetic Over-Sampling

被引:0
|
作者
Zhao, Tianxiang [1 ]
Zhang, Xiang [1 ]
Wang, Suhang [1 ]
机构
[1] Penn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
基金
美国国家科学基金会;
关键词
Task analysis; Training; Interpolation; Chatbots; Classification algorithms; Topology; Training data; Node classification; imbalanced learning; graph; data augmentation; graph neural network; SHORTEST-PATH; ROAD NETWORKS; QUERIES; DECOMPOSITION; MAINTENANCE; INDEX;
D O I
10.1109/TKDE.2024.3443160
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, graph neural networks (GNNs) have achieved state-of-the-art performance for node classification. However, most existing GNNs would suffer from the graph imbalance problem. In many real-world scenarios, node classes are imbalanced, with some majority classes making up most parts of the graph. The message propagation mechanism in GNNs would further amplify the dominance of those majority classes, resulting in sub-optimal classification performance. In this work, we seek to address this problem by generating pseudo instances of minority classes to balance the training data, extending previous over-sampling-based techniques. This task is non-trivial, as those techniques are designed with the assumption that instances are independent. Neglection of relation information would complicate this oversampling process. Furthermore, the node classification task typically takes the semi-supervised setting with only a few labeled nodes, providing insufficient supervision for the generation of minority instances. Generated new nodes of low quality would harm the trained classifier. In this work, we address these difficulties by synthesizing new nodes in a constructed embedding space, which encodes both node attributes and topology information. Furthermore, an edge generator is trained simultaneously to model the graph structure and provide relations for new samples. To further improve the data efficiency, we also explore synthesizing mixed "in-between" nodes to utilize nodes from the majority class in this over-sampling process. Experiments on real-world datasets validate the effectiveness of our proposed framework.
引用
收藏
页码:8515 / 8528
页数:14
相关论文
共 50 条
  • [31] KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling
    Ding, Hao
    Wei, Bin
    Gu, Zhaorui
    Yu, Zhibin
    Zheng, Haiyong
    Zheng, Bing
    Li, Juan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (21-22) : 14871 - 14888
  • [32] Joint Graph Augmentation and Adaptive Synthetic Sampling for Imbalanced Node Classification
    Lu, Guangquan
    Chen, Wanxin
    Han, Yadan
    Tang, Jiamin
    Huang, Faliang
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT IV, NLPCC 2024, 2025, 15362 : 469 - 482
  • [33] An Over-Sampling Technique with Rejection for Imbalanced Class Learning
    Lee, Jaedong
    Kim, Noo-ri
    Lee, Jee-Hyong
    ACM IMCOM 2015, PROCEEDINGS, 2015,
  • [34] Over-sampling methods for mixed data in imbalanced problems
    Alonso, Hugo
    da Costa, Joaquim Fernando Pinto
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024,
  • [35] Over-sampling imbalanced datasets using the covariance matrix
    Leguen-de Varona, Ireimis
    Madera, Julio
    Martínez-López, Yoan
    Hernández-Nieto, José Carlos
    EAI Endorsed Transactions on Energy Web, 2020, 7 (27) : 1 - 6
  • [36] Software defect prediction with imbalanced distribution by radius-synthetic minority over-sampling technique
    Guo, Shikai
    Dong, Jian
    Li, Hui
    Wang, Jiahui
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2021, 33 (07)
  • [37] Online Sequential Extreme Learning Machine with Under-Sampling and Over-Sampling for Imbalanced Big Data Classification
    Du, Jie
    Vong, Chi-Man
    Chang, Yajie
    Jiao, Yang
    PROCEEDINGS OF ELM-2016, 2018, 9 : 229 - 239
  • [38] Classification of Advertisement Text on Facebook Using Synthetic Minority Over-Sampling Technique
    Akkaradamrongrat, Suphamongkol
    Kachamas, Pornpimon
    Sinthupinyo, Sukree
    2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018), 2018,
  • [39] Classifier Learning from Imbalanced Corpus by Autoencoded Over-Sampling
    Park, Eunkyung
    Wong, Raymond K.
    Chu, Victor W.
    PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2019, 11670 : 16 - 29
  • [40] Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE
    Chen, Junfeng
    Zheng, Zhongtuan
    Computer Engineering and Applications, 2024, 57 (23) : 106 - 112