A simple KNN algorithm for text categorization

被引:114
|
作者
Soucy, P [1 ]
Mineau, GW [1 ]
机构
[1] Univ Laval, Dept Comp Sci, Quebec City, PQ G1K 7P4, Canada
关键词
D O I
10.1109/ICDM.2001.989592
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text categorization (also called text classification) is the process of identifying the class to which a text document belongs. This paper proposes to use a simple non-weighted features KNN algorithm for text categorization. We propose to use a feature selection method that finds the relevant features for the learning task at hand using feature interaction (based on word interdependencies). This will allow us to reduce considerably the number of selected features from which to learn, making our KNN algorithm applicable in contexts where both the volume of documents and the size of the vocabulary are high, like with the World Wide Web. Therefore, the KNN algorithm that we propose becomes efficient for classifying text documents in that context (in terms of its predictability and interpretability), as will be demonstrated. Its simplicity (w.r.t. its implementation and fine-tuning) becomes its main assets for on-the-field applications.
引用
收藏
页码:647 / 648
页数:2
相关论文
共 50 条
  • [1] Using KNN Algorithm for Text Categorization
    Wajeed, M. A.
    Adilakshmi, T.
    [J]. COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 796 - +
  • [2] A KNN BASED ALGORITHM FOR TEXT CATEGORIZATION
    Bucar, Joze
    Povh, Janez
    [J]. SOR'13 PROCEEDINGS: THE 12TH INTERNATIONAL SYMPOSIUM ON OPERATIONAL RESEARCH IN SLOVENIA, 2013, : 367 - 372
  • [3] A fast KNN algorithm for text categorization
    Wang, Yu
    Wang, Zheng-Ou
    [J]. PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 3436 - +
  • [4] KNN Text Categorization Algorithm Based on Semantic Centre
    Zhang Xiao-fei
    Huang He-yan
    Zhang Ke-liang
    [J]. 2009 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND COMPUTER SCIENCE, VOL 1, PROCEEDINGS, 2009, : 249 - +
  • [5] The Research of kNN Text Categorization Algorithm Based On Eager Learning
    Dong, Tao
    Cheng, Weinan
    Shang, Wenqian
    [J]. 2012 INTERNATIONAL CONFERENCE ON INDUSTRIAL CONTROL AND ELECTRONICS ENGINEERING (ICICEE), 2012, : 1120 - 1123
  • [6] Graph based KNN for Text Categorization
    Jo, Taeho
    [J]. 2018 20TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT), 2018, : 260 - 265
  • [7] String Vector based KNN for Text Categorization
    Jo, Taeho
    [J]. 2017 19TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATIONS TECHNOLOGY (ICACT) - OPENING NEW ERA OF SMART SOCIETY, 2017, : 458 - 463
  • [8] Using kNN model for automatic text categorization
    Gongde Guo
    Hui Wang
    David Bell
    Yaxin Bi
    Kieran Greer
    [J]. Soft Computing, 2006, 10 : 423 - 430
  • [9] Using kNN model for automatic text categorization
    Guo, GD
    Wang, H
    Bell, D
    Bi, YX
    Greer, K
    [J]. SOFT COMPUTING, 2006, 10 (05) : 423 - 430
  • [10] The Analysis and Optimization of KNN Algorithm Space-Time Efficiency for Chinese Text Categorization
    Cai, Ying
    Wang, Xiaofei
    [J]. ADVANCES IN COMPUTER SCIENCE, ENVIRONMENT, ECOINFORMATICS, AND EDUCATION, PT I, 2011, 214 : 542 - 550