A hybrid system for imbalanced data mining

被引:5
|
作者
Lee, Zne-Jung [1 ]
Lee, Chou-Yuan [1 ]
Chou, So-Tsung [1 ]
Ma, Wei-Ping [1 ]
Ye, Fulan [1 ]
Chen, Zhen [2 ,3 ]
机构
[1] Fuzhou Univ Int Studies & Trade, Sch Technol, Fuzhou, Fujian, Peoples R China
[2] Fuzhou Univ Int Studies & Trade, Acad Affairs, Fuzhou, Fujian, Peoples R China
[3] Angeles Univ Fdn, Dept Database Technol & Data Min, Angeles, Philippines
关键词
Hybrid systems - Decision trees - Support vector machines;
D O I
10.1007/s00542-019-04566-1
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the era of information explosion, the production and collection of data is growing massively. Data mining is the process of finding valuable information in data. For imbalanced data, the majority classes have more instances than those of the minority classes. When data grows with imbalanced feature, the majority classes obtain main focus and will ignore the importance of the minority classes. It becomes hard and hard to solve these problems. Another obstacle for imbalanced data mining is the lack of skilled resources such as distributed mechanism. Thus, it is not easy to solve these problems by traditional algorithms of data mining such as decision tree, random forest and support vector machine. In this paper, a hybrid system based on support vector machine and Apache Spark is proposed to imbalanced data mining. In the proposed system, SVM with two approaches is proposed to implement on Apache Spark to parallel process imbalanced data. Two datasets from UCI repository are used to verify the correctness of the proposed system. Simulation results demonstrate that the classification accuracy can be significantly promoted by the proposed system.
引用
收藏
页码:3043 / 3047
页数:5
相关论文
共 50 条
  • [1] A hybrid system for imbalanced data mining
    Zne-Jung Lee
    Chou-Yuan Lee
    So-Tsung Chou
    Wei-Ping Ma
    Fulan Ye
    Zhen Chen
    Microsystem Technologies, 2020, 26 : 3043 - 3047
  • [2] Data Mining on Imbalanced Data Sets
    Gu, Qiong
    Cai, Zhihua
    Zhu, Li
    Huang, Bo
    2008 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING, 2008, : 1020 - 1024
  • [3] Machine learning for mining imbalanced data
    Arafat, Md. Yasir
    Hoque, Sabera
    Xu, Shuxiang
    Farid, Dewan Md
    IAENG International Journal of Computer Science, 2019, 46 (02) : 332 - 348
  • [4] Hybrid sampling for imbalanced data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 202 - 207
  • [5] Hybrid sampling for imbalanced data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    INTEGRATED COMPUTER-AIDED ENGINEERING, 2009, 16 (03) : 193 - 210
  • [6] A Hybrid Data Mining Approach for Intrusion Detection on Imbalanced NSL-KDD Dataset
    Parsaei, Mohammad Reza
    Rostami, Samaneh Miri
    Javidan, Reza
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (06) : 20 - 25
  • [7] Hybrid Algorithm Based on Simulated Annealing and Bacterial Foraging Optimization for Mining Imbalanced Data
    Lee, Chou-Yuan
    Lee, Zne-Jung
    Huang, Jian-Qiong
    Ye, Fu-Lan
    Yao, Jie
    Ning, Zheng-Yuan
    Meen, Teen-Hang
    SENSORS AND MATERIALS, 2021, 33 (04) : 1297 - 1312
  • [8] A particle swarm based hybrid system for imbalanced medical data sampling
    Yang P.
    Xu L.
    Zhou B.B.
    Zhang Z.
    Zomaya A.Y.
    BMC Genomics, 10 (Suppl 3)
  • [9] Collective of Base Classifiers for Mining Imbalanced Data
    Jedrzejowicz, Joanna
    Jedrzejowicz, Piotr
    COMPUTATIONAL SCIENCE, ICCS 2022, PT II, 2022, : 571 - 585
  • [10] Hybrid Classifier Ensemble for Imbalanced Data
    Yang, Kaixiang
    Yu, Zhiwen
    Wen, Xin
    Cao, Wenming
    Chen, C. L. Philip
    Wong, Hau-San
    You, Jane
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (04) : 1387 - 1400