DynaQ: online learning from imbalanced multi-class streams through dynamic sampling

被引:3
|
作者
Sadeghi, Farnaz [1 ]
Viktor, Herna L. [1 ]
Vafaie, Parsa [1 ]
机构
[1] Univ Ottawa, Sch Elect Engn & Comp Sci, 800 King Edward Rd, Ottawa, ON K1N 6N5, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Online learning; Multi-class imbalance; Data streams; Ensembles; Concept drift; CLASSIFICATION; ENSEMBLE; INFORMATION; CHALLENGES; SELECTION;
D O I
10.1007/s10489-023-04886-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online supervised learning from fast-evolving data streams, particularly in domains such as health, the environment, and manufacturing, is a crucial research area. However, these domains often experience class imbalance, which can skew class distributions. It is essential for online learning algorithms to analyze large datasets in real-time while accurately modeling rare or infrequent classes that may appear in bursts. While methods have been proposed to handle binary class imbalance, there is a lack of attention to multi-class imbalanced settings with varying degrees of imbalance in evolving streams. In this paper, we present the Dynamic Queues (DynaQ) algorithm for online learning in multi-class imbalanced settings to fill this knowledge gap. Our approach utilizes a batch-based resampling method that creates an instance queue for each class to balance the number of instances. We maintain a queue threshold and remove older samples during training. Additionally, we dynamically oversample minority classes based on one of four rate parameters: recall, F1-score, ?m, and Euclidean distance. Our learning algorithm consists of an ensemble that uses sliding windows and a soft voting schema while incorporating a drift detection mechanism. Our experimental results demonstrate the superiority of the DynaQ approach over state-of-the-art methods.
引用
收藏
页码:24908 / 24930
页数:23
相关论文
共 50 条
  • [41] Evaluating Difficulty of Multi-class Imbalanced Data
    Lango, Mateusz
    Napierala, Krystyna
    Stefanowski, Jerzy
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2017, 2017, 10352 : 312 - 322
  • [42] Survey on Highly Imbalanced Multi-class Data
    Hamid, Hakim Abdul
    Yusoff, Marina
    Mohamed, Azlinah
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (06) : 211 - 229
  • [43] Deep Learning for Multi-Class Identification From Domestic Violence Online Posts
    Subramani, Sudha
    Michalska, Sandra
    Wang, Hua
    Du, Jiahua
    Zhang, Yanchun
    Shakeel, Haroon
    [J]. IEEE ACCESS, 2019, 7 : 46210 - 46224
  • [44] Multi-Class Active Learning by Uncertainty Sampling with Diversity Maximization
    Yang, Yi
    Ma, Zhigang
    Nie, Feiping
    Chang, Xiaojun
    Hauptmann, Alexander G.
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 113 (02) : 113 - 127
  • [45] Multi-Class Active Learning by Uncertainty Sampling with Diversity Maximization
    Yi Yang
    Zhigang Ma
    Feiping Nie
    Xiaojun Chang
    Alexander G. Hauptmann
    [J]. International Journal of Computer Vision, 2015, 113 : 113 - 127
  • [46] Localized Metric Learning for Large Multi-class Extremely Imbalanced Face Database
    Susan, Seba
    Kaushik, Ashu
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS. DASFAA 2022 INTERNATIONAL WORKSHOPS, 2022, 13248 : 64 - 78
  • [47] Diabetic retinopathy screening using deep learning for multi-class imbalanced datasets
    Saini, Manisha
    Susan, Seba
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 149
  • [48] Accurate and efficient sequential ensemble learning for highly imbalanced multi-class data
    Vong, Chi-Man
    Du, Jie
    [J]. NEURAL NETWORKS, 2020, 128 : 268 - 278
  • [49] Online Multi-Class LPBoost
    Saffari, Amir
    Godec, Martin
    Pock, Thomas
    Leistner, Christian
    Bischof, Horst
    [J]. 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 3570 - 3577
  • [50] OTLAMC: An Online Transfer Learning Algorithm for Multi-class Classification
    Kang, Zhongfeng
    Yang, Bo
    Li, Zesong
    Wang, Peng
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 176 : 133 - 146