ALLIE: Active Learning on Large-scale Imbalanced Graphs

被引:9
|
作者
Cui, Limeng [1 ]
Tang, Xianfeng [2 ]
Katariya, Sumeet [2 ]
Rao, Nikhil [2 ]
Agrawal, Pallav [2 ]
Subbian, Karthik [2 ]
Lee, Dongwon [1 ]
机构
[1] Penn State Univ, Philadelphia, PA 16801 USA
[2] Amazon, Seattle, WA USA
基金
美国国家科学基金会;
关键词
Graph neural networks; fraud detection; active learning; reinforcement learning; REDUCTION;
D O I
10.1145/3485447.3512229
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Human labeling is time-consuming and costly. This problem is further exacerbated in extremely imbalanced class label scenarios, such as detecting fraudsters in online websites. Active learning selects the most relevant example for human labelers to improve the model performance at a lower cost. However, existing methods for active learning for graph data often assumes that both data and label distributions are balanced. These assumptions fail in extreme rare-class classification scenarios, such as classifying abusive reviews in an e-commerce website. We propose a novel framework ALLIE to address this challenge of active learning in large-scale imbalanced graph data. In our approach, we efficiently sample from both majority and minority classes using a reinforcement learning agent with imbalance-aware reward function. We employ focal loss in the node classification model in order to focus more on rare class and improve the accuracy of the downstream model. Finally, we use a graph coarsening strategy to reduce the search space of the reinforcement learning agent. We conduct extensive experiments on benchmark graph datasets and real-world e-commerce datasets. ALLIE out-performs state-of-the-art graph-based active learning methods significantly, with up to 10% improvement of F1 score for the positive class. We also validate ALLIE on a proprietary e-commerce graph data by tasking it to detect abuse. Our coarsening strategy reduces the computational time by up to 38% in both proprietary and public datasets.
引用
收藏
页码:690 / 698
页数:9
相关论文
共 50 条
  • [41] Large-scale patterns and 'active longitudes'
    Obridko, Vladimir N.
    SOLAR AND STELLAR VARIABILITY: IMPACT ON EARTH AND PLANETS, 2010, (264): : 241 - 250
  • [42] Adversarial Caching Training: Unsupervised Inductive Network Representation Learning on Large-Scale Graphs
    Chen, Junyang
    Gong, Zhiguo
    Wang, Wei
    Wang, Cong
    Xu, Zhenghua
    Lv, Jianming
    Li, Xueliang
    Wu, Kaishun
    Liu, Weiwen
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (12) : 7079 - 7090
  • [43] Distributed Temporal Graph Neural Network Learning over Large-Scale Dynamic Graphs
    Fang, Ziquan
    Sun, Qichen
    Wang, Qilong
    Chen, Lu
    Gao, Yunjun
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2024, PT 2, 2025, 14851 : 51 - 66
  • [44] Large-scale quantum approximate optimization on nonplanar graphs with machine learning noise mitigation
    Sack, Stefan H.
    Egger, Daniel J.
    PHYSICAL REVIEW RESEARCH, 2024, 6 (01):
  • [45] AnySCAN: An Efficient Anytime Framework with Active Learning for Large-scale Network Clustering
    Zhao, Weizhong
    Chen, Gang
    Xu, Xiaowei
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 665 - 674
  • [46] SU-SAMPLING BASED ACTIVE LEARNING FOR LARGE-SCALE HISTOPATHOLOGY IMAGE
    Shen, Yiqing
    Ke, Jing
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 116 - 120
  • [47] Hashing Hyperplane Queries to Near Points with Applications to Large-Scale Active Learning
    Vijayanarasimhan, Sudheendra
    Jain, Prateek
    Grauman, Kristen
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (02) : 276 - 288
  • [48] On-the-Fly Active Learning of Interatomic Potentials for Large-Scale Atomistic Simulations
    Jinnouchi, Ryosuke
    Miwa, Kazutoshi
    Karsai, Ferenc
    Kresse, Georg
    Asahi, Ryoji
    JOURNAL OF PHYSICAL CHEMISTRY LETTERS, 2020, 11 (17): : 6946 - 6955
  • [49] Imbalanced RankBoost for Efficiently Ranking Large-Scale Image/Video Collections
    Merler, Michele
    Yan, Rong
    Smith, John R.
    CVPR: 2009 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-4, 2009, : 2599 - +
  • [50] Imbalanced Turbulence Modified by Large-scale Velocity Shears in the Solar Wind
    Soljento, Juska E.
    Good, Simon W.
    Osmane, Adnane
    Kilpua, Emilia K. J.
    ASTROPHYSICAL JOURNAL LETTERS, 2023, 946 (01)