Classification of Text Documents based on Naive Bayes using N-Gram Features

被引:0
|
作者
Baygin, Mehmet [1 ]
机构
[1] Ardahan Univ, Dept Comp Engn, TR-75000 Ardahan, Turkey
关键词
Naive Bayes; machine learning; document classification;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document classification is basically the process of categorizing documents in certain categories correctly. This process, which is usually used in the field of text mining, automatically classifies documents with large dimensions. In this paper, Turkish document classification was performed by using Naive Bayes approach which is one of the machine learning methods. With this approach, which basically uses 5 different categories, Turkish documents are classified quickly and automatically. In addition, the performance of the proposed approach was measured according to the basic evaluation criteria of precision, recall, accuracy and f-measure, and achieved a success rate of 92%. Also, the source codes of the application developed in this paper are presented as open source at https://drive.google.com/open?id=1Idp5VK1Q91vyqb940WjeoMpB9dVQuVC9.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Combining naive Bayes and n-gram language models for text classification
    Peng, FC
    Schuurmans, D
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 335 - 350
  • [2] Hybrid N-gram model using Naive Bayes for classification of political sentiments on Twitter
    Awwalu, Jamilu
    Abu Bakar, Azuraliza
    Yaakub, Mohd Ridzwan
    [J]. NEURAL COMPUTING & APPLICATIONS, 2019, 31 (12): : 9207 - 9220
  • [3] Text Classification using Gated Fusion of n-gram Features and Semantic Features
    Nagar, Ajay
    Bhasin, Anmol
    Mathur, Gaurav
    [J]. COMPUTACION Y SISTEMAS, 2019, 23 (03): : 1015 - 1020
  • [4] PERFORMANCE EVALUATION OF APPLYING N-GRAM BASED NAIVE BAYES CLASSIFIER FOR HIERARCHICAL CLASSIFICATION
    Shah, Jayna
    [J]. PROCEEDINGS OF THE 2019 3RD INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2019), 2019, : 92 - 98
  • [5] n-BiLSTM: BiLSTM with n-gram Features for Text Classification
    Zhang, Yunxiang
    Rao, Zhuyi
    [J]. PROCEEDINGS OF 2020 IEEE 5TH INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2020), 2020, : 1056 - 1059
  • [6] Using Character N-gram Features and Multinomial Naive Bayes for Sentiment Polarity Detection in Bengali Tweets
    Sarkar, Kamal
    [J]. PROCEEDINGS OF 2018 FIFTH INTERNATIONAL CONFERENCE ON EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2018,
  • [7] Classification of documents based on contents using the n-gram method of MNB model
    Najim, Junaina Jamil
    AL-Bayati, Aldin
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2015, 15 (10): : 17 - 21
  • [8] Web Page Classification using n-gram based URL Features
    Rajalakshmi, R.
    Aravindan, Chandrabose
    [J]. 2013 FIFTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2013, : 15 - 21
  • [9] N-GRAM ANALYSIS OF TEXT DOCUMENTS IN SERBIAN LANGUAGE
    Marovac, Ulfeta
    Pljaskovic, Aldina
    Crnisanin, Adela
    Kajan, Ejub
    [J]. 2012 20TH TELECOMMUNICATIONS FORUM (TELFOR), 2012, : 1385 - 1388
  • [10] Short Text Classification Based on Feature Extension Using The N-Gram Model
    Zhang, Xinwei
    Wu, Bin
    [J]. 2015 12TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2015, : 710 - 716