Authorship Attribution: A Comparative Study of Three Text Corpora and Three Languages

被引:14
|
作者
Savoy, Jacques [1 ]
机构
[1] Univ Neuchatel, Dept Comp Sci, CH-2000 Neuchatel, Switzerland
关键词
DELTA;
D O I
10.1080/09296174.2012.659003
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
The first objective of this paper is carry out three experiments intended to evaluate authorship attribution methods based on three test-collections available in three different languages (English, French, and German). In the first we represent and categorize 52 text excerpts written by nine authors and taken from 19th century English novels. In the second we work with 44 segments from French novels written by eleven authors, mostly from the 19th century. In the third we extract 59 German text excerpts from novels published mainly during the 19th and the beginning of the 20th century, written by 15 authors. The second objective is to analyse performance differences obtained when using word types or lemmas as text representations, and the third objective is to evaluate three authorship attribution schemes, the first of which uses principal component analysis (PCA), the second applies the Delta approach, and the third corresponds to a new authorship attribution method based on specific vocabulary. This concept is computed for a given text (or author profile) and then compared with the entire corpus. Based on this information, we show how a distance measure can be derived and by means of the nearest neighbor approach we suggest a simple and efficient authorship attribution scheme. Based on three test collections and using either word types or lemmas as features, we demonstrate that the suggested classification scheme performs better than the PCA method, and slightly better than the Delta approach.
引用
收藏
页码:132 / 161
页数:30
相关论文
共 50 条
  • [1] Stopword Graphs and Authorship Attribution in Text Corpora
    Arun, R.
    Suresh, V.
    Madhavan, C. E. Veni
    2009 IEEE THIRD INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2009), 2009, : 192 - 196
  • [2] Authorship Attribution of Noisy Text Data With a Comparative Study of Clustering Methods
    Hamadache, Zohra
    Sayoud, Halim
    INTERNATIONAL JOURNAL OF KNOWLEDGE AND SYSTEMS SCIENCE, 2018, 9 (02) : 45 - 69
  • [3] Authorship Attribution Using Text Distortion
    Stamatatos, Efstathios
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 1138 - 1149
  • [4] A review on authorship attribution in text mining
    Zheng, Wanwan
    Jin, Mingzhe
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2023, 15 (02)
  • [5] Authorship Attribution for Neural Text Generation
    Uchendu, Adaku
    Le, Thai
    Shu, Kai
    Lee, Dongwon
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 8384 - 8395
  • [6] A Computational Approach for Authorship Attribution on Multiple Languages
    Varela, Paulo J.
    Albonico, Michel
    Justino, Edson J. R.
    Bortolozzi, Flavin
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [7] Authorship Attribution in Latin Languages using Stylometry
    Varela, P.
    Albonico, M.
    Justino, E.
    Assis, J.
    IEEE LATIN AMERICA TRANSACTIONS, 2020, 18 (04) : 729 - 735
  • [8] A comparative study of machine learning methods for authorship attribution
    Jockers, Matthew L.
    Witten, Daniela M.
    LITERARY AND LINGUISTIC COMPUTING, 2010, 25 (02): : 215 - 223
  • [9] ASPECTS OF ENUNCIATION AND DISCURSIVE AUTHORSHIP IN THREE BRAZILIAN INDIGENOUS LANGUAGES
    Pereira, Antonia Alves
    REVISTA EDUCACAO E LINGUAGENS, 2014, 3 (04): : 25 - 35
  • [10] Data mining of text as a tool in authorship attribution
    Visa, A
    Toivonen, J
    Autio, S
    Mäkinen, J
    Back, B
    Vanharanta, H
    DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS AND TECHNOLOGY III, 2001, 4384 : 149 - 156