Influence of data discretization on efficiency of Bayesian classifier for authorship attribution

被引:16
|
作者
Baron, Grzegorz [1 ]
机构
[1] Silesian Tech Univ, PL-44100 Gliwice, Poland
关键词
Bayesian classifier; Naive Bayes; stylometry; authorship attribution; text analysis; classification; discretization; binarization; DECISION TREE; NAIVE;
D O I
10.1016/j.procs.2014.08.201
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Authorship attribution is one of the research areas in data mining domain and various methods can be employed for performing that task. The paper presents results of research on influence of data discretization on efficiency of Naive Bayes classifier. The analysis has been carried on datasets founded on texts of two male and two female authors using the WEKA data mining software framework. The binary classification was performed separately for both datasets for wide range of parameters of discretization process in order to investigate dependency between ways of discretization and quality of classification using Naive Bayes method. The numerical results of tests have been compared and discussed and some observations and conclusions formulated. (C) 2014 The Authors. Published by Elsevier B. V.
引用
收藏
页码:1112 / 1121
页数:10
相关论文
共 50 条
  • [41] Best bases Bayesian hierarchical classifier for hyperspectral data analysis
    Morgan, JT
    Henneguelle, A
    Crawford, MM
    Ghosh, J
    Neuenschwander, A
    IGARSS 2002: IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM AND 24TH CANADIAN SYMPOSIUM ON REMOTE SENSING, VOLS I-VI, PROCEEDINGS: REMOTE SENSING: INTEGRATING OUR VIEW OF THE PLANET, 2002, : 1434 - 1437
  • [42] Bayesian Classifier Algorithm Based on Emerging Pattern for Data Stream
    Du C.
    Wang Z.-H.
    Jiang J.-J.
    Sun Y.-G.
    Wang, Zhi-Hai (zhhwang@bjtu.edu.cn), 1600, Chinese Academy of Sciences (28): : 2891 - 2904
  • [43] The use of data set reliability factors in a multidimensional Bayesian classifier
    Máximo, OA
    Fernandes, D
    IGARSS 2002: IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM AND 24TH CANADIAN SYMPOSIUM ON REMOTE SENSING, VOLS I-VI, PROCEEDINGS: REMOTE SENSING: INTEGRATING OUR VIEW OF THE PLANET, 2002, : 2425 - 2427
  • [44] A Novel Parallel implementation of Naive Bayesian classifier for Big Data
    Katkar, Vijay D.
    Kulkarni, Siddhant Vijay
    2013 INTERNATIONAL CONFERENCE ON GREEN COMPUTING, COMMUNICATION AND CONSERVATION OF ENERGY (ICGCE), 2013, : 847 - 852
  • [45] Application of naive bayesian classifier on predictive study of incomplete data
    Zhou, Chao
    Liu, Yun
    Yang, Dongpeng
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON RISK AND RELIABILITY MANAGEMENT, VOLS I AND II, 2008, : 686 - 688
  • [46] Bayesian Block-Diagonal Predictive Classifier for Gaussian Data
    Corander, Jukka
    Koski, Timo
    Pavlenko, Tatjana
    Tillander, Annika
    SYNERGIES OF SOFT COMPUTING AND STATISTICS FOR INTELLIGENT DATA ANALYSIS, 2013, 190 : 543 - +
  • [47] An effective pattern-based Bayesian classifier for evolving data stream
    Yuan, Jidong
    Wang, Zhihai
    Sun, Yange
    Zhang, Wei
    Jiang, Jingjing
    NEUROCOMPUTING, 2018, 295 : 17 - 28
  • [48] Application of the Naive Bayesian Classifier in Work on Sentimental Analysis of Medical Data
    Boyko, Nataliya
    Boksho, Karina
    IDDM 2020: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON INFORMATICS & DATA-DRIVEN MEDICINE, 2020, 2753
  • [49] Bayesian Multi-net Classifier for classification of remote sensing data
    Ouyang, Y.
    Ma, J.
    Dai, Q.
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2006, 27 (21) : 4943 - 4961
  • [50] Classification analysis for iris data sets based on simple Bayesian classifier
    Ali, JMH
    Hassanien, AE
    7TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XII, PROCEEDINGS: INFORMATION SYSTEMS, TECHNOLOGIES AND APPLICATIONS: II, 2003, : 322 - 327