ALLIE: Active Learning on Large-scale Imbalanced Graphs

被引:9
|
作者
Cui, Limeng [1 ]
Tang, Xianfeng [2 ]
Katariya, Sumeet [2 ]
Rao, Nikhil [2 ]
Agrawal, Pallav [2 ]
Subbian, Karthik [2 ]
Lee, Dongwon [1 ]
机构
[1] Penn State Univ, Philadelphia, PA 16801 USA
[2] Amazon, Seattle, WA USA
基金
美国国家科学基金会;
关键词
Graph neural networks; fraud detection; active learning; reinforcement learning; REDUCTION;
D O I
10.1145/3485447.3512229
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Human labeling is time-consuming and costly. This problem is further exacerbated in extremely imbalanced class label scenarios, such as detecting fraudsters in online websites. Active learning selects the most relevant example for human labelers to improve the model performance at a lower cost. However, existing methods for active learning for graph data often assumes that both data and label distributions are balanced. These assumptions fail in extreme rare-class classification scenarios, such as classifying abusive reviews in an e-commerce website. We propose a novel framework ALLIE to address this challenge of active learning in large-scale imbalanced graph data. In our approach, we efficiently sample from both majority and minority classes using a reinforcement learning agent with imbalance-aware reward function. We employ focal loss in the node classification model in order to focus more on rare class and improve the accuracy of the downstream model. Finally, we use a graph coarsening strategy to reduce the search space of the reinforcement learning agent. We conduct extensive experiments on benchmark graph datasets and real-world e-commerce datasets. ALLIE out-performs state-of-the-art graph-based active learning methods significantly, with up to 10% improvement of F1 score for the positive class. We also validate ALLIE on a proprietary e-commerce graph data by tasking it to detect abuse. Our coarsening strategy reduces the computational time by up to 38% in both proprietary and public datasets.
引用
收藏
页码:690 / 698
页数:9
相关论文
共 50 条
  • [21] An Active Learning Based LDA Algorithm for Large-Scale Data Classification
    Yu X.
    Zhou Y.-P.
    Ren C.-N.
    Yu, Xu (yuxu0532@163.com), 1600, Science and Engineering Research Support Society (09): : 29 - 36
  • [22] In operando active learning of interatomic interaction during large-scale simulations
    Hodapp, M.
    Shapeev, A.
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2020, 1 (04):
  • [23] Fast Pairwise Query Selection for Large-Scale Active Learning to Rank
    Qian, Buyue
    Wang, Xiang
    Wang, Jun
    Li, Hongfei
    Cao, Nan
    Zhi, Weifeng
    Davidson, Ian
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 607 - 616
  • [24] Large-Scale Active Learning with Approximations of Expected Model Output Changes
    Kaeding, Christoph
    Freytag, Alexander
    Rodner, Erik
    Perino, Andrea
    Denzler, Joachim
    PATTERN RECOGNITION, GCPR 2016, 2016, 9796 : 179 - 191
  • [25] Ensemble Learning on Large Scale Financial Imbalanced Data
    Sanabila, H. R.
    Jatmiko, Wisnu
    2018 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS), 2018, : 93 - 98
  • [26] A Fast Distributed Classification Algorithm for Large-scale Imbalanced Data
    Wang, Huihui
    Gao, Yang
    Shi, Yinghuan
    Wang, Hao
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2016, : 1251 - 1256
  • [27] Group Centrality Maximization for Large-scale Graphs
    Angriman, Eugenio
    van der Grinten, Alexander
    Bojchevski, Aleksandar
    Zuegner, Daniel
    Guennemann, Stephan
    Meyerhenke, Henning
    2020 PROCEEDINGS OF THE SYMPOSIUM ON ALGORITHM ENGINEERING AND EXPERIMENTS, ALENEX, 2020, : 56 - 69
  • [28] Readable representations for large-scale bipartite graphs
    Sato, Shuji
    Misue, Kazuo
    Tanaka, Jiro
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS, 2008, 5178 : 831 - 838
  • [29] Understanding Coarsening for Embedding Large-Scale Graphs
    Akyildiz, Taha Atahan
    Aljundi, Amro Alabsi
    Kaya, Kamer
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2937 - 2946
  • [30] Generating Large-Scale Heterogeneous Graphs for Benchmarking
    Gupta, Amarnath
    SPECIFYING BIG DATA BENCHMARKS, 2014, 8163 : 113 - 128