ALLIE: Active Learning on Large-scale Imbalanced Graphs

被引:9
|
作者
Cui, Limeng [1 ]
Tang, Xianfeng [2 ]
Katariya, Sumeet [2 ]
Rao, Nikhil [2 ]
Agrawal, Pallav [2 ]
Subbian, Karthik [2 ]
Lee, Dongwon [1 ]
机构
[1] Penn State Univ, Philadelphia, PA 16801 USA
[2] Amazon, Seattle, WA USA
基金
美国国家科学基金会;
关键词
Graph neural networks; fraud detection; active learning; reinforcement learning; REDUCTION;
D O I
10.1145/3485447.3512229
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Human labeling is time-consuming and costly. This problem is further exacerbated in extremely imbalanced class label scenarios, such as detecting fraudsters in online websites. Active learning selects the most relevant example for human labelers to improve the model performance at a lower cost. However, existing methods for active learning for graph data often assumes that both data and label distributions are balanced. These assumptions fail in extreme rare-class classification scenarios, such as classifying abusive reviews in an e-commerce website. We propose a novel framework ALLIE to address this challenge of active learning in large-scale imbalanced graph data. In our approach, we efficiently sample from both majority and minority classes using a reinforcement learning agent with imbalance-aware reward function. We employ focal loss in the node classification model in order to focus more on rare class and improve the accuracy of the downstream model. Finally, we use a graph coarsening strategy to reduce the search space of the reinforcement learning agent. We conduct extensive experiments on benchmark graph datasets and real-world e-commerce datasets. ALLIE out-performs state-of-the-art graph-based active learning methods significantly, with up to 10% improvement of F1 score for the positive class. We also validate ALLIE on a proprietary e-commerce graph data by tasking it to detect abuse. Our coarsening strategy reduces the computational time by up to 38% in both proprietary and public datasets.
引用
收藏
页码:690 / 698
页数:9
相关论文
共 50 条
  • [1] Efficient Machine Learning On Large-Scale Graphs
    Erickson, Parker
    Lee, Victor E.
    Shi, Feng
    Tang, Jiliang
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4788 - 4789
  • [2] Large-scale Machine Learning over Graphs
    Yang, Yiming
    PROCEEDINGS OF THE 2018 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'18), 2018, : 9 - 9
  • [3] ActiveReach: an active learning framework for approximate reachability query answering in large-scale graphs
    Raghebi, Zohreh
    Banaei-Kashani, Farnoush
    FRONTIERS IN BIG DATA, 2024, 7
  • [4] ACTIVE LEARNING FOR LARGE-SCALE FACTOR ANALYSIS
    Silva, Jorge
    Carin, Lawrence
    2012 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2012, : 161 - 164
  • [5] Active Learning for Large-Scale Entity Resolution
    Qian, Kun
    Popa, Lucian
    Sen, Prithviraj
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 1379 - 1388
  • [6] Large-Scale Learning with Structural Kernels for Class-Imbalanced Datasets
    Severyn, Aliaksei
    Moschitti, Alessandro
    ETERNAL SYSTEMS, 2012, 255 : 34 - 41
  • [7] Large-Scale Image Classification Using Active Learning
    Alajlan, Naif
    Pasolli, Edoardo
    Melgani, Farid
    Franzoso, Andrea
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2014, 11 (01) : 259 - 263
  • [8] Applying Active Learning Strategy to Classify Large Scale Data with Imbalanced Classes
    Tuntiwachiratrakun, Phairod
    Vateekul, Peerapon
    2016 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES (ICCAIS), 2016, : 100 - 105
  • [9] An Active Learning Method Based on Mistake Sampling for Large Scale Imbalanced Classification
    Guo, Jia
    Wan, Xin
    Lin, Hao
    Li, Peng
    Liu, Guannan
    He, Yueying
    2017 14TH INTERNATIONAL CONFERENCE ON SERVICES SYSTEMS AND SERVICES MANAGEMENT (ICSSSM), 2017,
  • [10] AutoGMap: Learning to Map Large-Scale Sparse Graphs on Memristive Crossbars
    Lyu, Bo
    Wang, Shengbo
    Wen, Shiping
    Shi, Kaibo
    Yang, Yin
    Zeng, Lingfang
    Huang, Tingwen
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (09) : 12888 - 12898