Influence of data discretization on efficiency of Bayesian classifier for authorship attribution

被引:16
|
作者
Baron, Grzegorz [1 ]
机构
[1] Silesian Tech Univ, PL-44100 Gliwice, Poland
关键词
Bayesian classifier; Naive Bayes; stylometry; authorship attribution; text analysis; classification; discretization; binarization; DECISION TREE; NAIVE;
D O I
10.1016/j.procs.2014.08.201
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Authorship attribution is one of the research areas in data mining domain and various methods can be employed for performing that task. The paper presents results of research on influence of data discretization on efficiency of Naive Bayes classifier. The analysis has been carried on datasets founded on texts of two male and two female authors using the WEKA data mining software framework. The binary classification was performed separately for both datasets for wide range of parameters of discretization process in order to investigate dependency between ways of discretization and quality of classification using Naive Bayes method. The numerical results of tests have been compared and discussed and some observations and conclusions formulated. (C) 2014 The Authors. Published by Elsevier B. V.
引用
收藏
页码:1112 / 1121
页数:10
相关论文
共 50 条
  • [1] A Bayesian Ensemble Classifier for Source Code Authorship Attribution
    Tennyson, Matthew F.
    Mitropoulos, Francisco J.
    SIMILARITY SEARCH AND APPLICATIONS, 2014, 8821 : 265 - 276
  • [2] Authorship Attribution of Documents Using Data Compression as a Classifier
    Oliveira, W. R., Jr.
    Justino, E. J. R.
    Oliveira, L. E. S.
    WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2012, VOL I, 2012, : 112 - 115
  • [3] An Open-Set Size-Adjusted Bayesian Classifier for Authorship Attribution
    Schaalje, G. Bruce
    Blades, Natalie J.
    Funai, Tomohiko
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2013, 64 (09): : 1815 - 1825
  • [4] Authorship Attribution in Arabic Poetry Context UsingMarkov Chain classifier
    Ahmed, Al-Falahi
    Mohamed, Ramdani
    Mostafa, Bellafkih
    Mohammed, Al-Sarem
    2015 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS: THEORIES AND APPLICATIONS (SITA), 2015,
  • [5] Authorship attribution by data compression program
    Agata, T
    LIBRARY AND INFORMATION SCIENCE, 2005, (54): : 1 - 18
  • [6] A comparative analysis of discretization methods for with Naive Bayesian classifier
    Abraham, Ranjit
    Simha, Jay B.
    Yengar, S. S.
    ICIT 2006: 9TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, 2006, : 235 - +
  • [7] Authorship Attribution in Bengali Literature Using fastText's Hierarchical Classifier
    Chowdhury, Hemayet Ahmed
    Imon, Md. Azizul Haque
    Islam, Md. Saiful
    2018 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION & COMMUNICATION TECHNOLOGY (ICEEICT), 2018, : 102 - 106
  • [8] Data mining of text as a tool in authorship attribution
    Visa, A
    Toivonen, J
    Autio, S
    Mäkinen, J
    Back, B
    Vanharanta, H
    DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS AND TECHNOLOGY III, 2001, 4384 : 149 - 156
  • [9] Authorship Attribution using data from Reddit forum
    Casimiro, Guilherme Ramos
    Digiampietri, Luciano Antonio
    PROCEEDINGS OF 16TH BRAZILIAN SYMPOSIUM ON INFORMATION SYSTEMS ON DIGITAL TRANSFORMATION AND INNOVATION, SBSI 2020, 2020,
  • [10] Authorship Attribution for textual data on Online Social Networks
    Banga, Ritu
    Mehndiratta, Pulkit
    2017 TENTH INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING (IC3), 2017, : 155 - 161