To improve classification of imbalanced datasets

被引:0
|
作者
Shukla, Pratyusha [1 ]
Bhowmick, Kiran [1 ]
机构
[1] DJ Sanghvi Coll Engn, Dept Comp Engn, Bombay, Maharashtra, India
关键词
Imbalanced data; K-Means; SVM;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The task of accurately predicting the target class for each case in the data is called classification of data in data mining. Classification of balanced data set is fairly simple and easy to perform but it becomes difficult when the data is not balanced. Class Imbalance problem is the problem in machine learning where the total number of a class of data (positive) is far less than the total number of another class of data (negative). In this paper, we have used K-Means algorithm to balance the imbalanced dataset and then use SVM to classify the balanced dataset. We have compared the accuracy, precision, recall and time taken in classifying balanced as well as imbalanced datasets and results show that K-means helps in balancing the data and hence the accuracy and time taken to classify balanced dataset is much better than simply classifying the imbalanced dataset.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] RSMOTE: improving classification performance over imbalanced medical datasets
    Naseriparsa, Mehdi
    Al-Shammari, Ahmed
    Sheng, Ming
    Zhang, Yong
    Zhou, Rui
    [J]. HEALTH INFORMATION SCIENCE AND SYSTEMS, 2020, 8 (01)
  • [22] Balanced Sampling Meets Imbalanced Datasets in SAR Image Classification
    Jahan, Chowdhury Sadman
    Savakis, Andreas
    [J]. GEOSPATIAL INFORMATICS XIII, 2023, 12525
  • [23] An efficient classification approach in imbalanced datasets for intrinsic plagiarism detection
    Polydouri, Andrianna
    Vathi, Eleni
    Siolas, Georgios
    Stafylopatis, Andreas
    [J]. EVOLVING SYSTEMS, 2020, 11 (03) : 503 - 515
  • [24] GUM: A Guided Undersampling Method to Preprocess Imbalanced Datasets for Classification
    Sung, Kisuk
    Brown, W. Eric
    Moreno-Centeno, Erick
    Ding, Yu
    [J]. 2022 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2022, : 1086 - 1091
  • [25] An improved Support Vector Machine for the classification of imbalanced biological datasets
    Wang, Haiying
    Zheng, Huiru
    [J]. ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, PROCEEDINGS: WITH ASPECTS OF THEORETICAL AND METHODOLOGICAL ISSUES, 2008, 5226 : 63 - +
  • [26] Empirical Study of Sampling Methods for Classification in Imbalanced Clinical Datasets
    Kasem, Asem
    Ghaibeh, A. Ammar
    Moriguchi, Hiroki
    [J]. COMPUTATIONAL INTELLIGENCE IN INFORMATION SYSTEMS, CIIS 2016, 2017, 532 : 152 - 162
  • [27] Imbalanced datasets classification by fuzzy rule extraction and genetic algorithms
    Soler, Vicenc
    Cerquides, Jesus
    Sabria, Josep
    Roig, Jordi
    Prim, Marta
    [J]. ICDM 2006: SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, WORKSHOPS, 2006, : 330 - 334
  • [28] Kernel-Based SMOTE for SVM Classification of Imbalanced Datasets
    Mathew, Josey
    Luo, Ming
    Pang, Chee Khiang
    Chan, Hian Leng
    [J]. IECON 2015 - 41ST ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2015, : 1127 - 1132
  • [29] An efficient classification approach in imbalanced datasets for intrinsic plagiarism detection
    Andrianna Polydouri
    Eleni Vathi
    Georgios Siolas
    Andreas Stafylopatis
    [J]. Evolving Systems, 2020, 11 : 503 - 515
  • [30] Effects of the Use of Boosting on Classification Performance of Imbalanced Bioinformatics Datasets
    Khoshgoftaar, Taghi M.
    Fazelpour, Alireza
    Dittman, David J.
    Napolitano, Amri
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2014, : 420 - 426