An improved kNN text categorization algorithm based on cluster distribution

被引:0
|
作者
Luo, Yuansheng [1 ,2 ]
Wang, Minweng [2 ,3 ]
Le, Zhongjian [2 ]
Zhang, Huawei [1 ]
机构
[1] Modern Education Technology Center, Jiangxi University of Finance and Economics, Nanchang 330013, China
[2] School of Information Management, Jiangxi University of Finance and Economics, Nanchang 330013, China
[3] School of Computer Information and Engineering, Jiangxi Normal University, Nanchang 330022, China
来源
关键词
Nearest neighbor search - Text processing - Learning algorithms - Sampling - Clustering algorithms;
D O I
暂无
中图分类号
学科分类号
摘要
The traditional kNN text classification algorithm uses all training samples for classification, so its computation is very high for huge number of training samples. To address the problem, an improved kNN text classification algorithm based on cluster distribution is proposed in the paper. Firstly, the training sample sets of each category are clustered by k-means clustering algorithm, and all cluster centers are taken as the new training samples. Secondly, a weight value is introduced, which integrates the contribution of the large clusters, the effect of dispersive clusters and clustets distribution. Finally, the modified samples are trained for kNN text classification. The Experiments on Fudan university text classification corpus and 20 Newsgroups data set show that the proposed algorithm can not only effectively reduce the actual number of training samples and lower the computational complexity, but also improve the accuracy of kNN text classification algorithm. 1553-9105/Copyright © 2012 Binary Information Press.
引用
收藏
页码:1255 / 1263
相关论文
共 50 条
  • [31] An Improved KNN Algorithm Based on Minority Class Distribution for Imbalanced Dataset
    Zang, Bo
    Huang, Ruochen
    Wang, Lei
    Chen, Jianxin
    Tian, Feng
    Wei, Xin
    2016 INTERNATIONAL COMPUTER SYMPOSIUM (ICS), 2016, : 696 - 700
  • [32] An enhanced text categorization method based on improved text frequency approach and mutual information algorithm
    Pei Zhili
    Shi Xiaohu
    Marchese, Maurizio
    Liang Yanchun
    PROGRESS IN NATURAL SCIENCE-MATERIALS INTERNATIONAL, 2007, 17 (12) : 1494 - 1500
  • [34] An enhanced text categorization method based on improved text frequency approach and mutual information algorithm
    Key Laboratory of Symbol Computation and Knowledge Engineering, College of Computer Science and Technology, Jilin University, Changchun 130012, China
    不详
    不详
    Prog Nat Sci, 2007, 12 (1494-1500):
  • [35] An Improved Strategy of the Feature Selection Algorithm for the Text Categorization
    Yang, Jieming
    Lu, Yixin
    Liu, Zhiying
    2019 20TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2019, : 3 - 7
  • [36] The study on Web product reviews mining based on an improved text categorization algorithm
    Hu, Dongbin
    Luo, Lixia
    Xu, Lihua
    ELECTRONIC-BUSINESS INTELLIGENCE: FOR CORPORATE COMPETITIVE ADVANTAGES IN THE AGE OF EMERGING TECHNOLOGIES & GLOBALIZATION, 2010, 14 : 449 - 455
  • [37] Using kNN model for automatic text categorization
    Guo, GD
    Wang, H
    Bell, D
    Bi, YX
    Greer, K
    SOFT COMPUTING, 2006, 10 (05) : 423 - 430
  • [38] Using kNN model for automatic text categorization
    Gongde Guo
    Hui Wang
    David Bell
    Yaxin Bi
    Kieran Greer
    Soft Computing, 2006, 10 : 423 - 430
  • [39] An improved co-training text categorization algorithm based on diversity measures
    College of Information and Science Technique, Dalian Maritime University, Dalian 116026, China
    不详
    Tien Tzu Hsueh Pao, 2008, SUPPL. (138-143): : 138 - 143
  • [40] An kNN model-based approach and its application in text categorization
    Guo, GD
    Wang, H
    Bell, D
    Bi, YX
    Greer, K
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2004, 2945 : 559 - 570