ASE: Anomaly scoring based ensemble learning for highly imbalanced datasets

被引:1
|
作者
Liang, Xiayu [1 ]
Gao, Ying [1 ]
Xu, Shanrong [1 ]
机构
[1] South China Univ Technol, Guangzhou 510006, Peoples R China
关键词
Ensemble learning; Imbalanced datasets; Resampling; Anomaly detection; Bagging; SAMPLING METHOD; DATA-SETS; CLASSIFICATION; SMOTE; STACKING;
D O I
10.1016/j.eswa.2023.122049
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, many classification algorithms have been applied to various industries to help them work out their problems met in real-life scenarios. However, in many binary classification tasks, samples in the minority class only make up a small part of all instances, which leads to the datasets we get usually suffer from high imbalance ratio. Existing models sometimes treat minority classes as noise or ignore them as outliers encountering data skewing. In order to solve this problem, we propose a bagging ensemble learning framework ASE (Anomaly Scoring Based Ensemble Learning). This framework has a scoring system based on anomaly detection algorithms which can guide the resampling strategy by divided samples in the majority class into subspaces. Then specific number of instances will be under-sampled from each subspace to construct subsets by combining with the minority class. And we calculate the weights of base classifiers trained by the subsets according to the classification result of the anomaly detection model and the statistics of the subspaces. Experiments have been conducted which show that our ensemble learning model can dramatically improve the performance of base classifiers and is more efficient than other existing methods under a wide range of imbalance ratio, data scale and data dimension. ASE can be combined with various classifiers and every part of our framework has been proved to be reasonable and necessary.
引用
下载
收藏
页数:9
相关论文
共 50 条
  • [41] Dynamic ensemble selection for multi-class imbalanced datasets
    Garcia, Salvador
    Zhang, Zhong-Liang
    Altalhi, Abdulrahman
    Alshomrani, Saleh
    Herrera, Francisco
    INFORMATION SCIENCES, 2018, 445 : 22 - 37
  • [42] Ensemble learning method based on CNN for class imbalanced data
    Zhong, Xin
    Wang, Nan
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (07): : 10090 - 10121
  • [43] Constructing support vector machine ensemble with segmentation for imbalanced datasets
    Qian Li
    Bing Yang
    Yi Li
    Naiyang Deng
    Ling Jing
    Neural Computing and Applications, 2013, 22 : 249 - 256
  • [44] Linguistic Steganalysis Based on Clustering and Ensemble Learning in Imbalanced Scenario
    Guo, Shengnan
    Chen, Xuekai
    Wang, Zhuang
    Yang, Zhongliang
    Zhou, Linna
    DIGITAL FORENSICS AND WATERMARKING, IWDW 2023, 2024, 14511 : 304 - 318
  • [45] Sample and feature selecting based ensemble learning for imbalanced problems
    Wang, Zhe
    Jia, Peng
    Xu, Xinlei
    Wang, Bolu
    Zhu, Yujin
    Li, Dongdong
    APPLIED SOFT COMPUTING, 2021, 113
  • [46] Spark-based ensemble learning for imbalanced data classification
    Ding J.
    Wang S.
    Jia L.
    You J.
    Jiang Y.
    International Journal of Performability Engineering, 2018, 14 (05) : 945 - 964
  • [47] Universum based kernelized weighted extreme learning machine for imbalanced datasets
    Bhagat Singh Raghuwanshi
    Akansha Mangal
    Sanyam Shukla
    International Journal of Machine Learning and Cybernetics, 2022, 13 : 3387 - 3408
  • [48] Universum based kernelized weighted extreme learning machine for imbalanced datasets
    Raghuwanshi, Bhagat Singh
    Mangal, Akansha
    Shukla, Sanyam
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (11) : 3387 - 3408
  • [49] Spatial Ensemble Anomaly Detection Method for Exhaustive Map-Based Datasets
    Liu, Wendi
    Pyrcz, Michael J.
    ENERGY EXPLORATION & EXPLOITATION, 2023, 41 (02) : 406 - 420
  • [50] FIM-Based Pairwise Selection for Active Learning on Imbalanced Datasets
    Chen, Lixing
    Tian, Xuemin
    Cai, Lianfang
    2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 1876 - 1881