Research on the Improvement of K-Nearest Neighbor Classifier for Imbalanced Text Categorization

被引:0
|
作者
Yang Yanmei [1 ]
Xu Linying [1 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
关键词
Chinese text categorization; KNN; feature selection; SMOTE; Tomek-Links; SAMPLING METHOD; SMOTE;
D O I
10.1109/IMCCC.2018.00204
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Some of the most widely used text classification methods, such as the K-Nearest Neighbor (KNN) algorithm, the Native Bayes (NB) algorithm and the Support Vector Machine (SVM) algorithm, in terms of the good performance in balanced data classification, have performed poorly in imbalanced data classification. To solve this problem, many researchers have come up with their solutions, we also propose a new method to improve the performance of K-Nearest Neighbor classifier on imbalanced classification. In this paper, we combines K-Nearest Neighbor classifier with a new feature selection method called NFS, improved Synthetic Minority Over-sampling Technique (SMOTE) and Tomek Links Under-sampling Technique. The experimental results demonstrate that the improved method has a significant improvement on the classification efficiency of the bias dataset in the K-Nearest Neighbor classifier.
引用
收藏
页码:968 / 972
页数:5
相关论文
共 50 条
  • [1] Application of k-Nearest Neighbor on feature projections classifier to text categorization
    Yavuz, T
    Guvenir, HA
    [J]. ADVANCES IN COMPUTER AND INFORMATION SCIENCES '98, 1998, 53 : 135 - 142
  • [2] Text Categorization with K-Nearest Neighbor Approach
    Manne, Suneetha
    Kotha, Sita Kumari
    Fatima, S. Sameen
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 413 - +
  • [3] Binary k-nearest neighbor for text categorization
    Tan, SB
    [J]. ONLINE INFORMATION REVIEW, 2005, 29 (04) : 391 - 399
  • [4] K-Nearest Neighbor Algorithm Optimization in Text Categorization
    Chen, Shufeng
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON ENVIRONMENTAL SCIENCE AND MATERIAL APPLICATION (ESMA2017), VOLS 1-4, 2018, 108
  • [5] IMPROVING K-NEAREST NEIGHBOR EFFICIENCY FOR TEXT CATEGORIZATION
    Barigou, F.
    [J]. NEURAL NETWORK WORLD, 2016, 26 (01) : 45 - 65
  • [6] An Improvement To The k-Nearest Neighbor Classifier For ECG Database
    Jaafar, Haryati
    Ramli, Nur Hidayah
    Nasir, Aimi Salihah Abdul
    [J]. MALAYSIAN TECHNICAL UNIVERSITIES CONFERENCE ON ENGINEERING AND TECHNOLOGY 2017 (MUCET 2017), 2018, 318
  • [7] Hybrid k-Nearest Neighbor Classifier
    Yu, Zhiwen
    Chen, Hantao
    Liu, Jiming
    You, Jane
    Leung, Hareton
    Han, Guoqiang
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (06) : 1263 - 1275
  • [8] Modular k-nearest neighbor classification method for massively parallel text categorization
    Zhao, H
    Lu, BL
    [J]. COMPUTATIONAL AND INFORMATION SCIENCE, PROCEEDINGS, 2004, 3314 : 867 - 872
  • [9] Text categorization based on k-nearest neighbor approach for Web site classification
    Kwon, OW
    Lee, JH
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (01) : 25 - 44
  • [10] Evidential Editing K-Nearest Neighbor Classifier
    Jiao, Lianmeng
    Denoeux, Thierry
    Pan, Quan
    [J]. SYMBOLIC AND QUANTITATIVE APPROACHES TO REASONING WITH UNCERTAINTY, ECSQARU 2015, 2015, 9161 : 461 - 471