Influence of data discretization on efficiency of Bayesian classifier for authorship attribution

被引:16
|
作者
Baron, Grzegorz [1 ]
机构
[1] Silesian Tech Univ, PL-44100 Gliwice, Poland
关键词
Bayesian classifier; Naive Bayes; stylometry; authorship attribution; text analysis; classification; discretization; binarization; DECISION TREE; NAIVE;
D O I
10.1016/j.procs.2014.08.201
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Authorship attribution is one of the research areas in data mining domain and various methods can be employed for performing that task. The paper presents results of research on influence of data discretization on efficiency of Naive Bayes classifier. The analysis has been carried on datasets founded on texts of two male and two female authors using the WEKA data mining software framework. The binary classification was performed separately for both datasets for wide range of parameters of discretization process in order to investigate dependency between ways of discretization and quality of classification using Naive Bayes method. The numerical results of tests have been compared and discussed and some observations and conclusions formulated. (C) 2014 The Authors. Published by Elsevier B. V.
引用
收藏
页码:1112 / 1121
页数:10
相关论文
共 50 条
  • [21] Application of an efficient Bayesian discretization method to biomedical data
    Jonathan L Lustgarten
    Shyam Visweswaran
    Vanathi Gopalakrishnan
    Gregory F Cooper
    BMC Bioinformatics, 12
  • [22] Bayesian Network Classifier for Medical Data Analysis
    Reiz, Beata
    Csato, Lehel
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2009, 4 (01) : 65 - 72
  • [23] Influence of lexical, syntactic and structural features and their combination on Authorship Attribution for Telugu Text
    NagaPrasad, S.
    Narsimha, V. B.
    Reddy, P. Vijayapal
    Babu, A. Vinaya
    INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND CONVERGENCE (ICCC 2015), 2015, 48 : 58 - 64
  • [24] Tri-Training for authorship attribution with limited training data: a comprehensive study
    Qian, Tieyun
    Liu, Bing
    Chen, Li
    Peng, Zhiyong
    Zhong, Ming
    He, Guoliang
    Li, Xuhui
    Xu, Gang
    NEUROCOMPUTING, 2016, 171 : 798 - 806
  • [25] Authorship Attribution with Very Few Labeled Data: A Co-training Approach
    Fan, Mengdi
    Qian, Tieyun
    Chen, Li
    Liu, Bin
    Zhong, Ming
    He, Guoliang
    WEB-AGE INFORMATION MANAGEMENT, WAIM 2014, 2014, 8485 : 657 - 668
  • [26] Incremental Bayesian Classifier for Streaming Data with Concept Drift
    Wu, Peng
    Xiong, Ning
    Li, Gang
    Lv, Jinrui
    ADVANCES IN NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, ICNC-FSKD 2022, 2023, 153 : 509 - 518
  • [27] Bayesian Classifier for Medical Data from Doppler Unit
    Malek, J.
    ACTA POLYTECHNICA, 2006, 46 (04) : 21 - 22
  • [28] Temporal Data Driven Naive Bayesian Text Classifier
    Hao, Lili
    Hao, Lizhu
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS, VOLS 1-5, 2008, : 699 - +
  • [29] A Naive Bayesian Classifier in Categorical Uncertain Data Streams
    Ge, Jiaqi
    Xia, Yuni
    Wang, Jian
    2014 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2014, : 392 - 398
  • [30] A Pattern-Based Bayesian Classifier for Data Stream
    Yuan, Jidong
    Wang, Zhihai
    Sun, Yange
    Zhang, Wei
    Jiang, Jingjing
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 868 - 877