Influence of data discretization on efficiency of Bayesian classifier for authorship attribution

被引:16
|
作者
Baron, Grzegorz [1 ]
机构
[1] Silesian Tech Univ, PL-44100 Gliwice, Poland
关键词
Bayesian classifier; Naive Bayes; stylometry; authorship attribution; text analysis; classification; discretization; binarization; DECISION TREE; NAIVE;
D O I
10.1016/j.procs.2014.08.201
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Authorship attribution is one of the research areas in data mining domain and various methods can be employed for performing that task. The paper presents results of research on influence of data discretization on efficiency of Naive Bayes classifier. The analysis has been carried on datasets founded on texts of two male and two female authors using the WEKA data mining software framework. The binary classification was performed separately for both datasets for wide range of parameters of discretization process in order to investigate dependency between ways of discretization and quality of classification using Naive Bayes method. The numerical results of tests have been compared and discussed and some observations and conclusions formulated. (C) 2014 The Authors. Published by Elsevier B. V.
引用
收藏
页码:1112 / 1121
页数:10
相关论文
共 50 条
  • [31] NEW BAYESIAN SIMPLE CLASSIFIER FOR EDUCATIONAL DATA ANALYSIS
    Oviedo Bayas, Byron
    Zambrano-Vega, Cristian
    REVISTA UNIVERSIDAD Y SOCIEDAD, 2019, 11 (02): : 278 - 285
  • [32] Research on Naive Bayesian Classifier Model in Data Stream
    Xue, Qing
    Cao, Bowei
    Luo, Jia
    Zheng, Changwei
    Yu, Pinggang
    2010 INTERNATIONAL CONFERENCE ON INFORMATION, ELECTRONIC AND COMPUTER SCIENCE, VOLS 1-3, 2010, : 2094 - 2097
  • [33] A Bayesian Approach To Analysing Training Data Attribution In Deep Learning
    Nguyen, Elisa
    Seo, Minjoon
    Oh, Seong Joon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36, NEURIPS 2023, 2023,
  • [34] A Modular Bayesian Salmonella Source Attribution Model for Sparse Data
    Mikkela, Antti
    Ranta, Jukka
    Tuominen, Pirkko
    RISK ANALYSIS, 2019, 39 (08) : 1796 - 1811
  • [35] INVESTIGATING THE INFLUENCE OF CLASSIFIER EFFICIENCY ON MILL OUTPUT
    MIZONOV, VE
    USHAKOV, SG
    SHUVALOV, SI
    THERMAL ENGINEERING, 1984, 31 (04) : 219 - 221
  • [36] Influence of Features Discretization on Accuracy of Random Forest Classifier for Web User Identification
    Vorobeva, Alisa A.
    PROCEEDINGS OF THE 20TH CONFERENCE OF OPEN INNOVATIONS ASSOCIATION (FRUCT 2017), 2017, : 498 - 504
  • [37] A selective Bayesian classifier based on change of class relevance influence
    Cheng, Yu-Hu
    Tong, Yao-Yao
    Wang, Xue-Song
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2011, 39 (07): : 1628 - 1633
  • [38] Hybrid Bayesian networks: making the hybrid Bayesian classifier robust to missing training data
    Woody, NA
    Brown, SD
    JOURNAL OF CHEMOMETRICS, 2003, 17 (05) : 266 - 273
  • [39] Data Discretization for Dynamic Bayesian Network Based Modeling of Genetic Networks
    Nguyen Xuan Vinh
    Chetty, Madhu
    Coppel, Ross
    Wangikar, Pramod P.
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT II, 2012, 7664 : 298 - 306
  • [40] Sales Forecasting using Data warehouse and Naive Bayesian classifier
    Katkar, Vijay
    Gangopadhyay, Surupendu Prakash
    Rathod, Sagar
    Shetty, Aakash
    2015 INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING (ICPC), 2015,