ALLIE: Active Learning on Large-scale Imbalanced Graphs

被引:9
|
作者
Cui, Limeng [1 ]
Tang, Xianfeng [2 ]
Katariya, Sumeet [2 ]
Rao, Nikhil [2 ]
Agrawal, Pallav [2 ]
Subbian, Karthik [2 ]
Lee, Dongwon [1 ]
机构
[1] Penn State Univ, Philadelphia, PA 16801 USA
[2] Amazon, Seattle, WA USA
基金
美国国家科学基金会;
关键词
Graph neural networks; fraud detection; active learning; reinforcement learning; REDUCTION;
D O I
10.1145/3485447.3512229
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Human labeling is time-consuming and costly. This problem is further exacerbated in extremely imbalanced class label scenarios, such as detecting fraudsters in online websites. Active learning selects the most relevant example for human labelers to improve the model performance at a lower cost. However, existing methods for active learning for graph data often assumes that both data and label distributions are balanced. These assumptions fail in extreme rare-class classification scenarios, such as classifying abusive reviews in an e-commerce website. We propose a novel framework ALLIE to address this challenge of active learning in large-scale imbalanced graph data. In our approach, we efficiently sample from both majority and minority classes using a reinforcement learning agent with imbalance-aware reward function. We employ focal loss in the node classification model in order to focus more on rare class and improve the accuracy of the downstream model. Finally, we use a graph coarsening strategy to reduce the search space of the reinforcement learning agent. We conduct extensive experiments on benchmark graph datasets and real-world e-commerce datasets. ALLIE out-performs state-of-the-art graph-based active learning methods significantly, with up to 10% improvement of F1 score for the positive class. We also validate ALLIE on a proprietary e-commerce graph data by tasking it to detect abuse. Our coarsening strategy reduces the computational time by up to 38% in both proprietary and public datasets.
引用
收藏
页码:690 / 698
页数:9
相关论文
共 50 条
  • [31] Efficient mining algorithms for large-scale graphs
    Kishimoto, Yasunari
    Shiokawa, Hiroaki
    Fujiwara, Yasuhiro
    Onizuka, Makoto
    NTT Technical Review, 2013, 11 (12):
  • [32] An Ensemble Model for Diabetes Diagnosis in Large-scale and Imbalanced Dataset
    Wei, Xun
    Jiang, Fan
    Wei, Feng
    Zhang, Jiekui
    Liao, Weiwei
    Cheng, Shaoyin
    ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2017, 2017, : 71 - 78
  • [33] Parallel generation of large-scale random graphs
    Vullikanti, Anil
    2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 278 - 278
  • [34] Large-scale quantum networks based on graphs
    Epping, Michael
    Kampermann, Hermann
    Bruss, Dagmar
    NEW JOURNAL OF PHYSICS, 2016, 18
  • [35] Adaptive Partitioning of Large-Scale Dynamic Graphs
    Vaquero, Luis M.
    Cuadrado, Felix
    Logothetis, Dionysios
    Martella, Claudio
    2014 IEEE 34TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2014), 2014, : 144 - 153
  • [37] Multilevel Parallelism for the Exploration of Large-Scale Graphs
    Bernaschi, Massimo
    Bisson, Mauro
    Mastrostefano, Enrico
    Vella, Flavio
    IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, 2018, 4 (03): : 204 - 216
  • [38] Gaussian Embedding of Large-Scale Attributed Graphs
    Hettige, Bhagya
    Li, Yuan-Fang
    Wang, Weiqing
    Buntine, Wray
    DATABASES THEORY AND APPLICATIONS, ADC 2020, 2020, 12008 : 134 - 146
  • [39] Large-scale manifold learning
    Talwalkar, Ameet
    Kumar, Sanjiv
    Rowley, Henry
    2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 2554 - +
  • [40] ACTIVE MODELING OF LARGE-SCALE TURBULENCE
    BIENKIEWICZ, B
    CERMAK, JE
    PETERKA, JA
    SCANLAN, RH
    JOURNAL OF WIND ENGINEERING AND INDUSTRIAL AERODYNAMICS, 1983, 13 (1-3) : 465 - 475