An insight into imbalanced Big Data classification: outcomes and challenges

被引:135
|
作者
Fernandez, Alberto [1 ]
del Rio, Sara [1 ]
Chawla, Nitesh V. [2 ,3 ]
Herrera, Francisco [1 ]
机构
[1] Univ Granada, Dept Comp Sci & Artificial Intelligence, Granada, Spain
[2] Univ Notre Dame, Dept Comp Sci & Engn, 384 Fitzpatrick Hall, Notre Dame, IN 46556 USA
[3] Univ Notre Dame, Interdisciplinary Ctr Network Sci & Applicat, 384 Nieuwland Hall Sci, Notre Dame, IN 46556 USA
基金
美国国家科学基金会;
关键词
Big Data; Imbalanced classification; MapReduce; Pre-processing; Sampling; MAPREDUCE; PERFORMANCE; COMBINATION; SYSTEMS; SMOTE;
D O I
10.1007/s40747-017-0037-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big Data applications are emerging during the last years, and researchers from many disciplines are aware of the high advantages related to the knowledge extraction from this type of problem. However, traditional learning approaches cannot be directly applied due to scalability issues. To overcome this issue, the MapReduce framework has arisen as a "de facto" solution. Basically, it carries out a "divide-and-conquer" distributed procedure in a fault-tolerant way to adapt for commodity hardware. Being still a recent discipline, few research has been conducted on imbalanced classification for Big Data. The reasons behind this are mainly the difficulties in adapting standard techniques to the MapReduce programming style. Additionally, inner problems of imbalanced data, namely lack of data and small disjuncts, are accentuated during the data partitioning to fit the MapReduce programming style. This paper is designed under three main pillars. First, to present the first outcomes for imbalanced classification in Big Data problems, introducing the current research state of this area. Second, to analyze the behavior of standard pre-processing techniques in this particular framework. Finally, taking into account the experimental results obtained throughout this work, we will carry out a discussion on the challenges and future directions for the topic.
引用
收藏
页码:105 / 120
页数:16
相关论文
共 50 条
  • [1] An insight into imbalanced Big Data classification: outcomes and challenges
    Alberto Fernández
    Sara del Río
    Nitesh V. Chawla
    Francisco Herrera
    [J]. Complex & Intelligent Systems, 2017, 3 : 105 - 120
  • [2] Evolutionary Undersampling for Imbalanced Big Data Classification
    Triguero, I.
    Galar, M.
    Vluymans, S.
    Cornelis, C.
    Bustince, H.
    Herrera, F.
    Saeys, Y.
    [J]. 2015 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2015, : 715 - 722
  • [3] Imbalanced Big Data Classification: A Distributed Implementation of SMOTE
    Rastogi, Avnish Kumar
    Narang, Nitin
    Siddiqui, Zamir Ahmad
    [J]. PROCEEDINGS OF THE WORKSHOP PROGRAM OF THE 19TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING (ICDCN'18), 2018,
  • [4] Distributed classification for imbalanced big data in distributed environments
    Wang, Huihui
    Xiao, Mingfei
    Wu, Changsheng
    Zhang, Jing
    [J]. WIRELESS NETWORKS, 2024, 30 (05) : 3657 - 3668
  • [5] Severely imbalanced Big Data challenges: investigating data sampling approaches
    Tawfiq Hasanin
    Taghi M. Khoshgoftaar
    Joffrey L. Leevy
    Richard A. Bauder
    [J]. Journal of Big Data, 6
  • [6] Severely imbalanced Big Data challenges: investigating data sampling approaches
    Hasanin, Tawfiq
    Khoshgoftaar, Taghi M.
    Leevy, Joffrey L.
    Bauder, Richard A.
    [J]. JOURNAL OF BIG DATA, 2019, 6 (01)
  • [7] Multi-class imbalanced big data classification on Spark
    Sleeman, William C.
    Krawczyk, Bartosz
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 212
  • [8] An Imbalanced Dataset and Class Overlapping Classification Model for Big Data
    Prince, Mini
    Prathap, P. M. Joe
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2023, 44 (02): : 1009 - 1024
  • [9] Informative Evaluation Metrics for Highly Imbalanced Big Data Classification
    Hancock, John
    Khoshgoftaar, Taghi M.
    Johnson, Justin M.
    [J]. 2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1419 - 1426
  • [10] Imbalanced big data classification based on virtual reality in cloud computing
    Wen-da Xie
    Xiaochun Cheng
    [J]. Multimedia Tools and Applications, 2020, 79 : 16403 - 16420