A hybrid system for imbalanced data mining

被引:5
|
作者
Lee, Zne-Jung [1 ]
Lee, Chou-Yuan [1 ]
Chou, So-Tsung [1 ]
Ma, Wei-Ping [1 ]
Ye, Fulan [1 ]
Chen, Zhen [2 ,3 ]
机构
[1] Fuzhou Univ Int Studies & Trade, Sch Technol, Fuzhou, Fujian, Peoples R China
[2] Fuzhou Univ Int Studies & Trade, Acad Affairs, Fuzhou, Fujian, Peoples R China
[3] Angeles Univ Fdn, Dept Database Technol & Data Min, Angeles, Philippines
关键词
Hybrid systems - Decision trees - Support vector machines;
D O I
10.1007/s00542-019-04566-1
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the era of information explosion, the production and collection of data is growing massively. Data mining is the process of finding valuable information in data. For imbalanced data, the majority classes have more instances than those of the minority classes. When data grows with imbalanced feature, the majority classes obtain main focus and will ignore the importance of the minority classes. It becomes hard and hard to solve these problems. Another obstacle for imbalanced data mining is the lack of skilled resources such as distributed mechanism. Thus, it is not easy to solve these problems by traditional algorithms of data mining such as decision tree, random forest and support vector machine. In this paper, a hybrid system based on support vector machine and Apache Spark is proposed to imbalanced data mining. In the proposed system, SVM with two approaches is proposed to implement on Apache Spark to parallel process imbalanced data. Two datasets from UCI repository are used to verify the correctness of the proposed system. Simulation results demonstrate that the classification accuracy can be significantly promoted by the proposed system.
引用
收藏
页码:3043 / 3047
页数:5
相关论文
共 50 条
  • [21] A Hybrid Approach for Binary Classification of Imbalanced Data
    Tsai, Hsinhan
    Yang, Ta-Wei
    Wong, Wai-Man
    Kao, Han-Yi
    Chou, Cheng-Fu
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2024, 23 (03)
  • [22] Progressive Hybrid Classifier Ensemble for Imbalanced Data
    Yang, Kaixiang
    Yu, Zhiwen
    Chen, C. L. Philip
    Cao, Wenming
    Wong, Hau-San
    You, Jane
    Han, Guoqiang
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (04): : 2464 - 2478
  • [23] Imbalanced Data Stream Classification Using Hybrid Data Preprocessing
    Bobowska, Barbara
    Klikowski, Jakub
    Wozniak, Michal
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 402 - 413
  • [24] Fast Stochastic Recursive Momentum Methods for Imbalanced Data Mining
    Wu, Xidong
    Huang, Feihu
    Huang, Heng
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 578 - 587
  • [25] Mining impact-targeted activity patterns in imbalanced data
    Cao, Longbing
    Zhao, Yanchang
    Zhang, Chengqi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (08) : 1053 - 1066
  • [26] Imbalanced Data Mining Using Oversampling and Cellular GEP Ensemble
    Jedrzejowicz, Joanna
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2021), 2021, 12876 : 360 - 372
  • [27] A Hybrid Sampling SVM Approach to Imbalanced Data Classification
    Wang, Qiang
    ABSTRACT AND APPLIED ANALYSIS, 2014,
  • [28] Undersampling Instance Selection for Hybrid and Incomplete Imbalanced Data
    Camacho-Nieto, Oscar
    Yanez-Marquez, Cornelio
    Villuendas-Rey, Yenny
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2020, 26 (06) : 698 - 719
  • [29] CLUS: A New Hybrid Sampling Classification for Imbalanced Data
    Prachuabsupakij, Wanthanee
    PROCEEDINGS OF THE 2015 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2015, : 281 - 286
  • [30] A weighted hybrid ensemble method for classifying imbalanced data
    Zhao, Jiakun
    Jin, Ju
    Chen, Si
    Zhang, Ruifeng
    Yu, Bilin
    Liu, Qingfang
    KNOWLEDGE-BASED SYSTEMS, 2020, 203