Performance Comparison and Optimization of Text Document Classification using k-NN and Naive Bayes Classification Techniques

被引：18

作者：

Rasjid, Zulfany Erlisa ^{[1
]}

Setiawan, Reina ^{[1
]}

机构：

[1] Bina Nusantara Univ, Comp Sci Dept, Jl KH Syahdan 9, Jakarta 11480, Indonesia

来源：

DISCOVERY AND INNOVATION OF COMPUTER SCIENCE TECHNOLOGY IN ARTIFICIAL INTELLIGENCE ERA | 2017年 / 116卷

关键词：

k-NN; Naive Bayes; Text Document Classification; Information Retrieval;

D O I：

10.1016/j.procs.2017.10.017

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the current era, information is available in several different formats, such as text, image, video, audio and others. Corpus is a collection of documents in a large volume. By using Information Retrieval (IR), it is possible to obtain an unstructured information and automatic summary, classification and clustering. This research is to focus on data classification using two out of the six approaches of data classification, which is k-NN (k-Nearest Neighbors) and Naive Bayes. The text documents used is in XML format. The Corpus used in this research is downloaded from TREC Legal Track with a total of more than three thousand text documents and over twenty types of classifications. Out of the twenty types of classifications, six are chosen with the most number of text documents. The data is processed using RapidMiner software and the result shows that the optimum value for kin k-NN occurs at k=13. Using this value fork, the accruacy in average reached 55.17 percent, which is better than using Naive Bayes which is 39.01 percent. (C) 2017 The Authors. Published by Elsevier B.V.

引用

页码：107 / 112

页数：6

共 50 条

[1] Techniques for improving the performance of naive Bayes for text classification
Schneider, KM
[J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 682 - 693
[2] Selection of Relevant Features for Text Classification with K-NN
Balicki, Jerzy
Krawczyk, Henryk
Rymko, Lukasz
Szymanski, Julian
[J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2013, 7895 : 477 - 488
[3] Improving the k-NN and applying it to Chinese text classification
Yuan, F
Yang, L
Yu, G
[J]. Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 1547 - 1553
[4] A Comparative Study of Naive Bayes and k-NN Algorithm for Multi-class Drug Molecule Classification
Mandal, Lakshmi
Jana, Nanda Dulal
[J]. 2019 IEEE 16TH INDIA COUNCIL INTERNATIONAL CONFERENCE (IEEE INDICON 2019), 2019,
[5] Some effective techniques for naive Bayes text classification
Kim, Sang-Bum
Han, Kyoung-Soo
Rim, Hae-Chang
Myaeng, Sung Hyon
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (11) : 1457 - 1466
[6] Classification of Targets in SAR Images Using SVM and k-NN Techniques
Demirhan, Mahmut Esat
Salor, Ozgul
[J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1581 - 1584
[7] Topic document model approach for naive Bayes text classification
Kim, SB
Rim, HC
Kim, JD
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (05): : 1091 - 1094
[8] The Improved Text Classification Method Based on Bayesian and k-NN
Tao, Wang
Liang, Huo
Liu, Yang
[J]. PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE OF MODERN COMPUTER SCIENCE AND APPLICATIONS, 2013, 191 : 57 - +
[9] <bold>AN OPTIMIZATION ALGORITHM OF K-NN CLASSIFICATION</bold>
Zhan, Yan
Chen, Hao
Zhang, Guo-Chun
[J]. PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 2246 - +
[10] Comparison of Color Identification on Soccer Robot using Color Filtering, k-NN and Naive Bayes
Suyono, Hadi
Setyawati, Onny
Amri, Syaiful
[J]. 2018 2ND INTERNATIONAL CONFERENCE ON APPLIED ELECTROMAGNETIC TECHNOLOGY (AEMT), 2018, : 57 - 60

← 1 2 3 4 5 →