An improved kNN text categorization algorithm based on cluster distribution

被引:0
|
作者
Luo, Yuansheng [1 ,2 ]
Wang, Minweng [2 ,3 ]
Le, Zhongjian [2 ]
Zhang, Huawei [1 ]
机构
[1] Modern Education Technology Center, Jiangxi University of Finance and Economics, Nanchang 330013, China
[2] School of Information Management, Jiangxi University of Finance and Economics, Nanchang 330013, China
[3] School of Computer Information and Engineering, Jiangxi Normal University, Nanchang 330022, China
来源
关键词
Nearest neighbor search - Text processing - Learning algorithms - Sampling - Clustering algorithms;
D O I
暂无
中图分类号
学科分类号
摘要
The traditional kNN text classification algorithm uses all training samples for classification, so its computation is very high for huge number of training samples. To address the problem, an improved kNN text classification algorithm based on cluster distribution is proposed in the paper. Firstly, the training sample sets of each category are clustered by k-means clustering algorithm, and all cluster centers are taken as the new training samples. Secondly, a weight value is introduced, which integrates the contribution of the large clusters, the effect of dispersive clusters and clustets distribution. Finally, the modified samples are trained for kNN text classification. The Experiments on Fudan university text classification corpus and 20 Newsgroups data set show that the proposed algorithm can not only effectively reduce the actual number of training samples and lower the computational complexity, but also improve the accuracy of kNN text classification algorithm. 1553-9105/Copyright © 2012 Binary Information Press.
引用
收藏
页码:1255 / 1263
相关论文
共 50 条
  • [21] Research on text categorization model based on LDA - KNN
    Chen, Weihua
    Zhang, Xian
    2017 IEEE 2ND ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2017, : 2719 - 2726
  • [22] An improved web text classification algorithm based on SVM-KNN
    Cao, Jianfang
    Chen, Junjie
    ADVANCES IN MECHATRONICS AND CONTROL ENGINEERING, PTS 1-3, 2013, 278-280 : 1305 - 1308
  • [23] A Clustering-Based KNN Improved Algorithm CLKNN for Text Classification
    Zhou, Lijuan
    Wang, Linshuang
    Ge, Xuebin
    Shi, Qian
    2010 2ND INTERNATIONAL ASIA CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS (CAR 2010), VOL 3, 2010, : 212 - 215
  • [24] Contextual Text Categorization: An Improved Stemming Algorithm to Increase the Quality of Categorization in Arabic Text
    Gadri, Said
    Moussaoui, Abdelouahab
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (06) : 835 - 841
  • [25] KNN with TF-IDF Based Framework for Text Categorization
    Trstenjak, Bruno
    Mikac, Sasa
    Donko, Dzenana
    24TH DAAAM INTERNATIONAL SYMPOSIUM ON INTELLIGENT MANUFACTURING AND AUTOMATION, 2013, 2014, 69 : 1356 - 1364
  • [26] Impact of Instance Selection on kNN-Based Text Categorization
    Barigou, Fatiha
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2018, 14 (02): : 418 - 434
  • [27] Improved KNN Text Classification Algorithm with MapReduce Implementation
    Zhao, Yan
    Qian, Yun
    Li, Cuixia
    2017 4TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2017, : 1417 - 1422
  • [28] Inverted Index based Modified Version of KNN for Text Categorization
    Jo, Taeho
    JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2008, 4 (01): : 17 - 26
  • [29] Efficient KNN text categorization based on multiedit and condensing techniques
    Hao, Xiu-Lan
    Zhang, Cheng-Hong
    Wang, Shu-Yun
    Tao, Xiao-Peng
    Hu, Yun-Fa
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 3571 - +
  • [30] An Improved Weighted KNN Algorithm About Text Classification Based on Spark Framework
    Yang, Tianming
    Du, Shaobo
    2022 IEEE 10TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION AND NETWORKS (ICICN 2022), 2022, : 655 - 661