A hybrid system for imbalanced data mining

被引:5
|
作者
Lee, Zne-Jung [1 ]
Lee, Chou-Yuan [1 ]
Chou, So-Tsung [1 ]
Ma, Wei-Ping [1 ]
Ye, Fulan [1 ]
Chen, Zhen [2 ,3 ]
机构
[1] Fuzhou Univ Int Studies & Trade, Sch Technol, Fuzhou, Fujian, Peoples R China
[2] Fuzhou Univ Int Studies & Trade, Acad Affairs, Fuzhou, Fujian, Peoples R China
[3] Angeles Univ Fdn, Dept Database Technol & Data Min, Angeles, Philippines
关键词
Hybrid systems - Decision trees - Support vector machines;
D O I
10.1007/s00542-019-04566-1
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the era of information explosion, the production and collection of data is growing massively. Data mining is the process of finding valuable information in data. For imbalanced data, the majority classes have more instances than those of the minority classes. When data grows with imbalanced feature, the majority classes obtain main focus and will ignore the importance of the minority classes. It becomes hard and hard to solve these problems. Another obstacle for imbalanced data mining is the lack of skilled resources such as distributed mechanism. Thus, it is not easy to solve these problems by traditional algorithms of data mining such as decision tree, random forest and support vector machine. In this paper, a hybrid system based on support vector machine and Apache Spark is proposed to imbalanced data mining. In the proposed system, SVM with two approaches is proposed to implement on Apache Spark to parallel process imbalanced data. Two datasets from UCI repository are used to verify the correctness of the proposed system. Simulation results demonstrate that the classification accuracy can be significantly promoted by the proposed system.
引用
收藏
页码:3043 / 3047
页数:5
相关论文
共 50 条
  • [41] An Imbalanced Big Data Mining Framework for Improving Optimization Algorithms Performance
    Hassib, Eslam Mohsen
    El-Desouky, Ali Ibrahim
    El-Kenawy, El-Sayed M.
    El-Ghamrawy, Sally M.
    IEEE ACCESS, 2019, 7 : 170774 - 170795
  • [42] Transferable common feature space mining for fault diagnosis with imbalanced data
    Lu, Na
    Yin, Tao
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2021, 156
  • [43] HSDP: A Hybrid Sampling Method for Imbalanced Big Data Based on Data Partition
    Chen, Liping
    Jiang, Jiabao
    Zhang, Yong
    COMPLEXITY, 2021, 2021
  • [44] Optimized hybrid imbalanced data sampling for decision tree training
    Wegier, Weronika
    Koziarski, Michal
    Wozniak, Michal
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 339 - 342
  • [45] A Hybrid Model Based on Samples Difficulty for Imbalanced Data Classification
    Shan, Ao
    Chung, Yeh-Ching
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT I, 2023, 14254 : 26 - 37
  • [46] NEW HYBRID DATA PREPROCESSING TECHNIQUE FOR HIGHLY IMBALANCED DATASET
    Malik, Esraa Faisal
    Khaw, Khai Wah
    Chew, XinYing
    COMPUTING AND INFORMATICS, 2022, 41 (04) : 981 - 1001
  • [47] Learning From Imbalanced Data With Deep Density Hybrid Sampling
    Liu, Chien-Liang
    Chang, Yu-Hua
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (11): : 7065 - 7077
  • [48] A hybrid stacking classifier with feature selection for handling imbalanced data
    Abraham A.
    Kayalvizhi R.
    Mohideen H.S.
    Journal of Intelligent and Fuzzy Systems, 2024, 46 (04): : 9103 - 9117
  • [49] An imbalanced data processing method based on hybrid CGAN and SMOTEENN
    Liu N.
    Zhu B.
    Yin Y.-C.
    Li X.-C.
    Kongzhi yu Juece/Control and Decision, 2023, 38 (09): : 2614 - 2621
  • [50] Hybrid probabilistic sampling with random subspace for imbalanced data learning
    Cao, Peng
    Zhao, Dazhe
    Zaiane, Osmar
    INTELLIGENT DATA ANALYSIS, 2014, 18 (06) : 1089 - 1108