Classification of Text Documents based on Naive Bayes using N-Gram Features

被引：0

作者：

Baygin, Mehmet ^{[1
]}

机构：

[1] Ardahan Univ, Dept Comp Engn, TR-75000 Ardahan, Turkey

来源：

2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP) | 2018年

关键词：

Naive Bayes; machine learning; document classification;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Document classification is basically the process of categorizing documents in certain categories correctly. This process, which is usually used in the field of text mining, automatically classifies documents with large dimensions. In this paper, Turkish document classification was performed by using Naive Bayes approach which is one of the machine learning methods. With this approach, which basically uses 5 different categories, Turkish documents are classified quickly and automatically. In addition, the performance of the proposed approach was measured according to the basic evaluation criteria of precision, recall, accuracy and f-measure, and achieved a success rate of 92%. Also, the source codes of the application developed in this paper are presented as open source at https://drive.google.com/open?id=1Idp5VK1Q91vyqb940WjeoMpB9dVQuVC9.

引用

页数：5

共 50 条

[1] Combining naive Bayes and n-gram language models for text classification
Peng, FC
Schuurmans, D
[J]. ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 335 - 350
[2] Hybrid N-gram model using Naive Bayes for classification of political sentiments on Twitter
Awwalu, Jamilu
Abu Bakar, Azuraliza
Yaakub, Mohd Ridzwan
[J]. NEURAL COMPUTING & APPLICATIONS, 2019, 31 (12): : 9207 - 9220
[3] Text Classification using Gated Fusion of n-gram Features and Semantic Features
Nagar, Ajay
Bhasin, Anmol
Mathur, Gaurav
[J]. COMPUTACION Y SISTEMAS, 2019, 23 (03): : 1015 - 1020
[4] PERFORMANCE EVALUATION OF APPLYING N-GRAM BASED NAIVE BAYES CLASSIFIER FOR HIERARCHICAL CLASSIFICATION
Shah, Jayna
[J]. PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, : 92 - 98
[5] n-BiLSTM: BiLSTM with n-gram Features for Text Classification
Zhang, Yunxiang
Rao, Zhuyi
[J]. PROCEEDINGS OF 2020 IEEE 5TH INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2020), 2020, : 1056 - 1059
[6] Using Character N-gram Features and Multinomial Naive Bayes for Sentiment Polarity Detection in Bengali Tweets
Sarkar, Kamal
[J]. PROCEEDINGS OF 2018 FIFTH INTERNATIONAL CONFERENCE ON EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2018,
[7] Classification of documents based on contents using the n-gram method of MNB model
Najim, Junaina Jamil
AL-Bayati, Aldin
[J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2015, 15 (10): : 17 - 21
[8] Web Page Classification using n-gram based URL Features
Rajalakshmi, R.
Aravindan, Chandrabose
[J]. 2013 FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2013, : 15 - 21
[9] N-GRAM ANALYSIS OF TEXT DOCUMENTS IN SERBIAN LANGUAGE
Marovac, Ulfeta
Pljaskovic, Aldina
Crnisanin, Adela
Kajan, Ejub
[J]. 2012 20TH TELECOMMUNICATIONS FORUM (TELFOR), 2012, : 1385 - 1388
[10] Short Text Classification Based on Feature Extension Using The N-Gram Model
Zhang, Xinwei
Wu, Bin
[J]. 2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 710 - 716

← 1 2 3 4 5 →