To improve classification of imbalanced datasets

被引:0
|
作者
Shukla, Pratyusha [1 ]
Bhowmick, Kiran [1 ]
机构
[1] DJ Sanghvi Coll Engn, Dept Comp Engn, Bombay, Maharashtra, India
关键词
Imbalanced data; K-Means; SVM;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The task of accurately predicting the target class for each case in the data is called classification of data in data mining. Classification of balanced data set is fairly simple and easy to perform but it becomes difficult when the data is not balanced. Class Imbalance problem is the problem in machine learning where the total number of a class of data (positive) is far less than the total number of another class of data (negative). In this paper, we have used K-Means algorithm to balance the imbalanced dataset and then use SVM to classify the balanced dataset. We have compared the accuracy, precision, recall and time taken in classifying balanced as well as imbalanced datasets and results show that K-means helps in balancing the data and hence the accuracy and time taken to classify balanced dataset is much better than simply classifying the imbalanced dataset.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Using Voronoi diagrams to improve classification performances when modeling imbalanced datasets
    Young, William A., II
    Nykl, Scott L.
    Weckman, Gary R.
    Chelberg, David M.
    [J]. NEURAL COMPUTING & APPLICATIONS, 2015, 26 (05): : 1041 - 1054
  • [2] Using Voronoi diagrams to improve classification performances when modeling imbalanced datasets
    William A. Young
    Scott L. Nykl
    Gary R. Weckman
    David M. Chelberg
    [J]. Neural Computing and Applications, 2015, 26 : 1041 - 1054
  • [3] Classification of Antimicrobial Peptides with Imbalanced Datasets
    Camacho, Francy L.
    Torres, Rodrigo
    Ramos Pollan, Raul
    [J]. 11TH INTERNATIONAL SYMPOSIUM ON MEDICAL INFORMATION PROCESSING AND ANALYSIS, 2015, 9681
  • [4] Discrimination Aware Classification for Imbalanced Datasets
    Ristanoski, Goce
    Liu, Wei
    Bailey, James
    [J]. PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1529 - 1532
  • [5] Using Genetic Algorithm to Improve Classification of Imbalanced Datasets for credit card fraud detection
    Benchaji, Ibtissam
    Douzi, Samira
    El Ouahidi, Bouabid
    [J]. 2018 2ND CYBER SECURITY IN NETWORKING CONFERENCE (CSNET), 2018,
  • [6] Imbalanced classification in sparse and large behaviour datasets
    Jellis Vanhoeyveld
    David Martens
    [J]. Data Mining and Knowledge Discovery, 2018, 32 : 25 - 82
  • [7] A robust loss function for classification with imbalanced datasets
    Wang, Yidan
    Yang, Liming
    [J]. NEUROCOMPUTING, 2019, 331 : 40 - 49
  • [8] Imbalanced classification in sparse and large behaviour datasets
    Vanhoeyveld, Jellis
    Martens, David
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2018, 32 (01) : 25 - 82
  • [9] FLSOM with Different Rates for Classification in Imbalanced Datasets
    Machon-Gonzalez, Ivan
    Lopez-Garcia, Hilario
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 642 - 651
  • [10] Categorical classifiers in multiclass classification with imbalanced datasets
    Carpita, Maurizio
    Golia, Silvia
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (04) : 391 - 405