An improved kNN text categorization algorithm based on cluster distribution

被引:0
|
作者
Luo, Yuansheng [1 ,2 ]
Wang, Minweng [2 ,3 ]
Le, Zhongjian [2 ]
Zhang, Huawei [1 ]
机构
[1] Modern Education Technology Center, Jiangxi University of Finance and Economics, Nanchang 330013, China
[2] School of Information Management, Jiangxi University of Finance and Economics, Nanchang 330013, China
[3] School of Computer Information and Engineering, Jiangxi Normal University, Nanchang 330022, China
来源
关键词
Nearest neighbor search - Text processing - Learning algorithms - Sampling - Clustering algorithms;
D O I
暂无
中图分类号
学科分类号
摘要
The traditional kNN text classification algorithm uses all training samples for classification, so its computation is very high for huge number of training samples. To address the problem, an improved kNN text classification algorithm based on cluster distribution is proposed in the paper. Firstly, the training sample sets of each category are clustered by k-means clustering algorithm, and all cluster centers are taken as the new training samples. Secondly, a weight value is introduced, which integrates the contribution of the large clusters, the effect of dispersive clusters and clustets distribution. Finally, the modified samples are trained for kNN text classification. The Experiments on Fudan university text classification corpus and 20 Newsgroups data set show that the proposed algorithm can not only effectively reduce the actual number of training samples and lower the computational complexity, but also improve the accuracy of kNN text classification algorithm. 1553-9105/Copyright © 2012 Binary Information Press.
引用
收藏
页码:1255 / 1263
相关论文
共 50 条
  • [1] An improved KNN text categorization algorithm by adopting cluster technology
    Zhang, Xiao-Fei
    Huang, He-Yan
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2009, 22 (06): : 936 - 940
  • [2] A KNN BASED ALGORITHM FOR TEXT CATEGORIZATION
    Bucar, Joze
    Povh, Janez
    SOR'13 PROCEEDINGS: THE 12TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH IN SLOVENIA, 2013, : 367 - 372
  • [3] Text categorization with KNN algorithm
    Zhang, Ning
    Jia, Ziyan
    Shi, Zhongzhi
    Jisuanji Gongcheng/Computer Engineering, 2005, 31 (08): : 171 - 172
  • [4] KNN Text Categorization Algorithm Based on Semantic Centre
    Zhang Xiao-fei
    Huang He-yan
    Zhang Ke-liang
    2009 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND COMPUTER SCIENCE, VOL 1, PROCEEDINGS, 2009, : 249 - +
  • [5] Using KNN Algorithm for Text Categorization
    Wajeed, M. A.
    Adilakshmi, T.
    COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 796 - +
  • [6] A fast KNN algorithm for text categorization
    Wang, Yu
    Wang, Zheng-Ou
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 3436 - +
  • [7] A simple KNN algorithm for text categorization
    Soucy, P
    Mineau, GW
    2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 647 - 648
  • [8] Research on text categorization based on improved DNN-KNN
    Wang, Xiye
    Jiang, Mingyang
    Pei, Zhili
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2018, 124 : 53 - 53
  • [9] The Research of kNN Text Categorization Algorithm Based On Eager Learning
    Dong, Tao
    Cheng, Weinan
    Shang, Wenqian
    2012 INTERNATIONAL CONFERENCE ON INDUSTRIAL CONTROL AND ELECTRONICS ENGINEERING (ICICEE), 2012, : 1120 - 1123
  • [10] Graph based KNN for Text Categorization
    Jo, Taeho
    2018 20TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT), 2018, : 260 - 265