ALLIE: Active Learning on Large-scale Imbalanced Graphs

被引：9

作者：

Cui, Limeng ^{[1
]}

Tang, Xianfeng ^{[2
]}

Katariya, Sumeet ^{[2
]}

Rao, Nikhil ^{[2
]}

Agrawal, Pallav ^{[2
]}

Subbian, Karthik ^{[2
]}

Lee, Dongwon ^{[1
]}

机构：

[1] Penn State Univ, Philadelphia, PA 16801 USA

[2] Amazon, Seattle, WA USA

来源：

PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22) | 2022年

基金：

美国国家科学基金会;

关键词：

Graph neural networks; fraud detection; active learning; reinforcement learning; REDUCTION;

D O I：

10.1145/3485447.3512229

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Human labeling is time-consuming and costly. This problem is further exacerbated in extremely imbalanced class label scenarios, such as detecting fraudsters in online websites. Active learning selects the most relevant example for human labelers to improve the model performance at a lower cost. However, existing methods for active learning for graph data often assumes that both data and label distributions are balanced. These assumptions fail in extreme rare-class classification scenarios, such as classifying abusive reviews in an e-commerce website. We propose a novel framework ALLIE to address this challenge of active learning in large-scale imbalanced graph data. In our approach, we efficiently sample from both majority and minority classes using a reinforcement learning agent with imbalance-aware reward function. We employ focal loss in the node classification model in order to focus more on rare class and improve the accuracy of the downstream model. Finally, we use a graph coarsening strategy to reduce the search space of the reinforcement learning agent. We conduct extensive experiments on benchmark graph datasets and real-world e-commerce datasets. ALLIE out-performs state-of-the-art graph-based active learning methods significantly, with up to 10% improvement of F1 score for the positive class. We also validate ALLIE on a proprietary e-commerce graph data by tasking it to detect abuse. Our coarsening strategy reduces the computational time by up to 38% in both proprietary and public datasets.

引用

页码：690 / 698

页数：9

共 50 条

[1] Efficient Machine Learning On Large-Scale Graphs
Erickson, Parker
Lee, Victor E.
Shi, Feng
Tang, Jiliang
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4788 - 4789
[2] Large-scale Machine Learning over Graphs
Yang, Yiming
PROCEEDINGS OF THE 2018 ACM SIGIR INTERNATIONAL CONFERENCE ON THEORY OF INFORMATION RETRIEVAL (ICTIR'18), 2018, : 9 - 9
[3] ActiveReach: an active learning framework for approximate reachability query answering in large-scale graphs
Raghebi, Zohreh
Banaei-Kashani, Farnoush
FRONTIERS IN BIG DATA, 2024, 7
[4] ACTIVE LEARNING FOR LARGE-SCALE FACTOR ANALYSIS
Silva, Jorge
Carin, Lawrence
2012 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2012, : 161 - 164
[5] Active Learning for Large-Scale Entity Resolution
Qian, Kun
Popa, Lucian
Sen, Prithviraj
CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 1379 - 1388
[6] Large-Scale Learning with Structural Kernels for Class-Imbalanced Datasets
Severyn, Aliaksei
Moschitti, Alessandro
ETERNAL SYSTEMS, 2012, 255 : 34 - 41
[7] Large-Scale Image Classification Using Active Learning
Alajlan, Naif
Pasolli, Edoardo
Melgani, Farid
Franzoso, Andrea
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2014, 11 (01) : 259 - 263
[8] Applying Active Learning Strategy to Classify Large Scale Data with Imbalanced Classes
Tuntiwachiratrakun, Phairod
Vateekul, Peerapon
2016 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES (ICCAIS), 2016, : 100 - 105
[9] An Active Learning Method Based on Mistake Sampling for Large Scale Imbalanced Classification
Guo, Jia
Wan, Xin
Lin, Hao
Li, Peng
Liu, Guannan
He, Yueying
2017 14TH INTERNATIONAL CONFERENCE ON SERVICES SYSTEMS AND SERVICES MANAGEMENT (ICSSSM), 2017,
[10] AutoGMap: Learning to Map Large-Scale Sparse Graphs on Memristive Crossbars
Lyu, Bo
Wang, Shengbo
Wen, Shiping
Shi, Kaibo
Yang, Yin
Zeng, Lingfang
Huang, Tingwen
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (09) : 12888 - 12898

← 1 2 3 4 5 →