Oversampling imbalanced data in the string space

被引:35
|
作者
Castellanos, Francisco J. [1 ]
Valero-Mas, Jose J. [1 ]
Calvo-Zaragoza, Jorge [1 ]
Rico-Juan, Juan R. [1 ]
机构
[1] Univ Alicante, Pattern Recognit & Artificial Intelligence Grp, Dept Software & Comp Syst, Alicante 03690, Spain
关键词
Class imbalance problem; Oversampling; String space; SMOTE; NEAREST-NEIGHBOR; RECOGNITION; SMOTE;
D O I
10.1016/j.patrec.2018.01.003
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced data is a typical problem in the supervised classification field, which occurs when the different classes are not equally represented. This fact typically results in the classifier biasing its performance towards the class representing the majority of the elements. Many methods have been proposed to alleviate this scenario, yet all of them assume that data is represented as feature vectors. In this paper we propose a strategy to balance a dataset whose samples are encoded as strings. Our approach is based on adapting the well-known Synthetic Minority Over-sampling Technique (SMOTE) algorithm to the string space. More precisely, data generation is achieved with an iterative approach to create artificial strings within the segment between two given samples of the training set. Results with several datasets and imbalance ratios show that the proposed strategy properly deals with the problem in all cases considered. (c) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:32 / 38
页数:7
相关论文
共 50 条
  • [1] Classification of Imbalanced Data by Oversampling in Kernel Space of Support Vector Machines
    Mathew, Josey
    Pang, Chee Khiang
    Luo, Ming
    Leong, Weng Hoe
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (09) : 4065 - 4076
  • [2] Grouping-based Oversampling in Kernel Space for Imbalanced Data Classification
    Ren, Jinjun
    Wang, Yuping
    Cheung, Yiu-ming
    Gao, Xiao-Zhi
    Guo, Xiaofang
    [J]. PATTERN RECOGNITION, 2023, 133
  • [3] Oversampling techniques for imbalanced data in regression
    Belhaouari, Samir Brahim
    Islam, Ashhadul
    Kassoul, Khelil
    Al-Fuqaha, Ala
    Bouzerdoum, Abdesselam
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 252
  • [4] Adaptive Oversampling for Imbalanced Data Classification
    Ertekin, Seyda
    [J]. INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 261 - 269
  • [5] A new oversampling method in the string space
    Briones-Segovia, Victor A.
    Jimenez-Villar, Victor
    Ariel Carrasco-Ochoa, Jesus
    Fco Martinez-Trinidad, Jose
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 183
  • [6] Selective oversampling approach for strongly imbalanced data
    Gnip P.
    Vokorokos L.
    Drotár P.
    [J]. PeerJ Computer Science, 2021, 7 : 1 - 22
  • [7] Selective oversampling approach for strongly imbalanced data
    Gnip, Peter
    Vokorokos, Liberios
    Drotar, Peter
    [J]. PEERJ COMPUTER SCIENCE, 2021,
  • [8] Oversampling for Imbalanced Data via Optimal Transport
    Yan, Yuguang
    Tan, Mingkui
    Xu, Yanwu
    Cao, Jiezhang
    Ng, Michael
    Min, Huaqing
    Wu, Qingyao
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 5605 - 5612
  • [9] Oversampling the minority class in a multi-linear feature space for imbalanced data classification
    Liang, Peifeng
    Li, Weite
    Hu, Jinglu
    [J]. IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2018, 13 (10) : 1483 - 1491
  • [10] Novel Oversampling Algorithm for Handling Imbalanced Data Classification Novel Oversampling Algorithm
    More, Anjali S.
    Rana, Dipti P.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 491 - 496