SOUL: Scala Oversampling and Undersampling Library for imbalance classification

被引:1
|
作者
Rodriguez, Nestor [1 ]
Lopez, David [1 ]
Fernandez, Alberto [1 ]
Garcia, Salvador [1 ]
Herrera, Francisco [1 ]
机构
[1] Univ Granada, DaSCI Andalusian Inst Data Sci & Computat Intelli, Granada, Spain
关键词
Oversampling; Undersampling; Scala; Imbalanced classification; SMOTE; PERFORMANCE; CHALLENGES; SELECTION; SPARK;
D O I
10.1016/j.softx.2021.100767
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The improvements in technology and computation have promoted a global adoption of Data Science. It is devoted to extracting significant knowledge from high amounts of information by means of the application of Artificial Intelligence and Machine Learning tools. Among the different tasks within Data Science, classification is probably the most widespread overall. Focusing on the classification scenario, we often face some datasets in which the number of instances for one of the classes is much lower than that of the remaining ones. This issue is known as the imbalanced classification problem, and it is mainly related to the need for boosting the recognition of the minority class examples. In spite of a large number of solutions that were proposed in the specialized literature to address imbalanced classification, there is a lack of open-source software that compiles the most relevant ones in an easy-to-use and scalable way. In this paper, we present a novel software approach named as SOUL, which stands for Scala Oversampling and Undersampling Library for imbalanced classification. The main capabilities of this new library include a large number of different data preprocessing techniques, efficient execution of these approaches, and a graphical environment to contrast the output for the different preprocessing solutions. (C) 2021 The Authors. Published by Elsevier B.V.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Undersampling of approaching the classification boundary for imbalance problem
    Jiang, Lei
    Yuan, Peng
    Liao, Jing
    Zhang, Qiongbing
    Liu, Jianxun
    Li, Keqin
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2023, 35 (06): : 1
  • [2] Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification
    Salehi, Amirreza
    Khedmati, Majid
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [3] Imbalance: Oversampling algorithms for imbalanced classification in R
    Cordon, Ignacio
    Garcia, Salvador
    Fernandez, Alberto
    Herrera, Francisco
    KNOWLEDGE-BASED SYSTEMS, 2018, 161 : 329 - 341
  • [4] CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for Imbalanced Data Classification
    Koziarski, Michal
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [5] Efficient hybrid oversampling and intelligent undersampling for imbalanced big data classification
    Vairetti, Carla
    Assadi, Jose Luis
    Maldonado, Sebastian
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 246
  • [6] Enhanced classification of hyperspectral images using improvised oversampling and undersampling techniques
    Singh P.S.
    Singh V.P.
    Pandey M.K.
    Karthikeyan S.
    International Journal of Information Technology, 2022, 14 (1) : 389 - 396
  • [7] Comparative Analysis of Undersampling, Oversampling, and SMOTE Techniques for Addressing Class Imbalance in Phishing Website Detection
    Omari, Kamal
    Taoussi, Chaimae
    Oukhatar, Ayoub
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2025, 16 (02) : 751 - 757
  • [8] Subspace-based minority oversampling for imbalance classification
    Li, Tianjun
    Wang, Yingxu
    Liu, Licheng
    Chen, Long
    Chen, C. L. Philip
    INFORMATION SCIENCES, 2023, 621 : 371 - 388
  • [9] An Undersampling Method Approaching the Ideal Classification Boundary for Imbalance Problems
    Zhou, Wensheng
    Liu, Chen
    Yuan, Peng
    Jiang, Lei
    APPLIED SCIENCES-BASEL, 2024, 14 (13):
  • [10] Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems
    Ng, Wing W. Y.
    Hu, Junjie
    Yeung, Daniel S.
    Yin, Shaohua
    Roli, Fabio
    IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (11) : 2402 - 2412