Performance Comparison and Optimization of Text Document Classification using k-NN and Naive Bayes Classification Techniques

被引:18
|
作者
Rasjid, Zulfany Erlisa [1 ]
Setiawan, Reina [1 ]
机构
[1] Bina Nusantara Univ, Comp Sci Dept, Jl KH Syahdan 9, Jakarta 11480, Indonesia
关键词
k-NN; Naive Bayes; Text Document Classification; Information Retrieval;
D O I
10.1016/j.procs.2017.10.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the current era, information is available in several different formats, such as text, image, video, audio and others. Corpus is a collection of documents in a large volume. By using Information Retrieval (IR), it is possible to obtain an unstructured information and automatic summary, classification and clustering. This research is to focus on data classification using two out of the six approaches of data classification, which is k-NN (k-Nearest Neighbors) and Naive Bayes. The text documents used is in XML format. The Corpus used in this research is downloaded from TREC Legal Track with a total of more than three thousand text documents and over twenty types of classifications. Out of the twenty types of classifications, six are chosen with the most number of text documents. The data is processed using RapidMiner software and the result shows that the optimum value for kin k-NN occurs at k=13. Using this value fork, the accruacy in average reached 55.17 percent, which is better than using Naive Bayes which is 39.01 percent. (C) 2017 The Authors. Published by Elsevier B.V.
引用
收藏
页码:107 / 112
页数:6
相关论文
共 50 条
  • [31] Bayesian Naive Bayes classifiers to text classification
    Xu, Shuo
    [J]. JOURNAL OF INFORMATION SCIENCE, 2018, 44 (01) : 48 - 59
  • [32] Naive Bayes for text classification with unbalanced classes
    Frank, Eibe
    Bouckaert, Remco R.
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2006, PROCEEDINGS, 2006, 4213 : 503 - 510
  • [33] Adapting naive Bayes tree for text classification
    Wang, Shasha
    Jiang, Liangxiao
    Li, Chaoqun
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 44 (01) : 77 - 89
  • [34] Text classification using scores based k-NN approach and term to category relevance weighting scheme
    Ben Afia, Ahmed
    Amiri, Hamid
    [J]. INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2016, 9 (4-5) : 283 - 290
  • [35] k-NN Text Classification using an FPGA-Based Sparse Matrix Vector Multiplication Accelerator
    Townsend, Kevin R.
    Sun, Song
    Johnson, Tyler
    Attia, Osama G.
    Jones, Phillip H.
    Zambreno, Joseph
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRO/INFORMATION TECHNOLOGY (EIT), 2015, : 257 - 263
  • [36] Detection of Cancer in Lung With K-NN Classification Using Genetic Algorithm
    Bhuvaneswari, P.
    Therese, A. Brintha
    [J]. 2ND INTERNATIONAL CONFERENCE ON NANOMATERIALS AND TECHNOLOGIES (CNT 2014), 2015, 10 : 433 - 440
  • [37] Adaptive K-NN metric classification based on improved Kepler optimization algorithm
    Cai, Liang
    Zhao, Shijie
    Meng, Fanshuai
    Zhang, Tianran
    [J]. Journal of Supercomputing, 2025, 81 (01):
  • [38] Fast k-NN classification using the cluster-space approach
    Jia, XP
    Richards, JA
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2005, 2 (02) : 225 - 228
  • [39] Privacy-Preserving K-NN Classification Using Vector Operations
    Jalla, Hanumantharao
    Girija, P. N.
    [J]. INFORMATION AND COMMUNICATION TECHNOLOGY FOR COMPETITIVE STRATEGIES, 2019, 40 : 655 - 664
  • [40] Solving the Structure-Property Problem Using k-NN Classification
    Perevoznikov, Aleksandr
    Shestov, Alexey
    Permiakov, Evgenii
    Kumskov, Mikhail
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, 2011, 6744 : 49 - 53