A hybrid system for imbalanced data mining

被引:5
|
作者
Lee, Zne-Jung [1 ]
Lee, Chou-Yuan [1 ]
Chou, So-Tsung [1 ]
Ma, Wei-Ping [1 ]
Ye, Fulan [1 ]
Chen, Zhen [2 ,3 ]
机构
[1] Fuzhou Univ Int Studies & Trade, Sch Technol, Fuzhou, Fujian, Peoples R China
[2] Fuzhou Univ Int Studies & Trade, Acad Affairs, Fuzhou, Fujian, Peoples R China
[3] Angeles Univ Fdn, Dept Database Technol & Data Min, Angeles, Philippines
关键词
Hybrid systems - Decision trees - Support vector machines;
D O I
10.1007/s00542-019-04566-1
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the era of information explosion, the production and collection of data is growing massively. Data mining is the process of finding valuable information in data. For imbalanced data, the majority classes have more instances than those of the minority classes. When data grows with imbalanced feature, the majority classes obtain main focus and will ignore the importance of the minority classes. It becomes hard and hard to solve these problems. Another obstacle for imbalanced data mining is the lack of skilled resources such as distributed mechanism. Thus, it is not easy to solve these problems by traditional algorithms of data mining such as decision tree, random forest and support vector machine. In this paper, a hybrid system based on support vector machine and Apache Spark is proposed to imbalanced data mining. In the proposed system, SVM with two approaches is proposed to implement on Apache Spark to parallel process imbalanced data. Two datasets from UCI repository are used to verify the correctness of the proposed system. Simulation results demonstrate that the classification accuracy can be significantly promoted by the proposed system.
引用
收藏
页码:3043 / 3047
页数:5
相关论文
共 50 条
  • [31] A hybrid imbalanced classification model based on data density
    Shi, Shengnan
    Li, Jie
    Zhu, Dan
    Yang, Fang
    Xu, Yong
    INFORMATION SCIENCES, 2023, 624 : 50 - 67
  • [32] Hybrid kernel machine ensemble for imbalanced data sets
    Li, Peng
    Chan, Kap Luk
    Fang, Wen
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 1108 - +
  • [33] A weighted hybrid ensemble method for classifying imbalanced data
    Zhao, Jiakun
    Jin, Ju
    Chen, Si
    Zhang, Ruifeng
    Yu, Bilin
    Liu, Qingfang
    Knowledge-Based Systems, 2020, 203
  • [34] An intelligent manufacturing process diagnosis system using hybrid data mining
    Hur, Joon
    Lee, Hongchul
    Baek, Jun-Geol
    ADVANCES IN DATA MINING: APPLICATIONS IN MEDICINE, WEB MINING, MARKETING, IMAGE AND SIGNAL MINING, 2006, 4065 : 561 - 575
  • [35] Intrusion Detection System by Using Hybrid Algorithm of Data Mining Technique
    Foroushani, Zohreh Abtahi
    Li, Yue
    PROCEEDINGS OF 2018 7TH INTERNATIONAL CONFERENCE ON SOFTWARE AND COMPUTER APPLICATIONS (ICSCA 2018), 2018, : 119 - 123
  • [36] Prediction of Depression for Undergraduate Students Based on Imbalanced Data by Using Data Mining Techniques
    Narkbunnum, Warawut
    Wisaeng, Kittipol
    APPLIED SYSTEM INNOVATION, 2022, 5 (06)
  • [37] The Gradual Resampling Ensemble for mining imbalanced data streams with concept drift
    Ren, Siqi
    Liao, Bo
    Zhu, Wen
    Li, Zeng
    Liu, Wei
    Li, Keqin
    NEUROCOMPUTING, 2018, 286 : 150 - 166
  • [38] Research on data mining method for imbalanced dataset based on improved SMOTE
    Yang, Zhi-Ming
    Qiao, Li-Yan
    Peng, Xi-Yuan
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2007, 36 (SUPPL. 2): : 22 - 26
  • [39] A dynamic ensemble learning based data mining framework for medical imbalanced big data
    Rithani, M.
    Kumar, R. Prasanna
    Ali, Altalbe
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [40] A Novel Strategy for Mining Highly Imbalanced Data in Credit Card Transactions
    Zareapoor, Masoumeh
    Yang, Jie
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2018, 24 (04): : 721 - 727