Authorship Attribution: A Comparative Study of Three Text Corpora and Three Languages

被引:14
|
作者
Savoy, Jacques [1 ]
机构
[1] Univ Neuchatel, Dept Comp Sci, CH-2000 Neuchatel, Switzerland
关键词
DELTA;
D O I
10.1080/09296174.2012.659003
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
The first objective of this paper is carry out three experiments intended to evaluate authorship attribution methods based on three test-collections available in three different languages (English, French, and German). In the first we represent and categorize 52 text excerpts written by nine authors and taken from 19th century English novels. In the second we work with 44 segments from French novels written by eleven authors, mostly from the 19th century. In the third we extract 59 German text excerpts from novels published mainly during the 19th and the beginning of the 20th century, written by 15 authors. The second objective is to analyse performance differences obtained when using word types or lemmas as text representations, and the third objective is to evaluate three authorship attribution schemes, the first of which uses principal component analysis (PCA), the second applies the Delta approach, and the third corresponds to a new authorship attribution method based on specific vocabulary. This concept is computed for a given text (or author profile) and then compared with the entire corpus. Based on this information, we show how a distance measure can be derived and by means of the nearest neighbor approach we suggest a simple and efficient authorship attribution scheme. Based on three test collections and using either word types or lemmas as features, we demonstrate that the suggested classification scheme performs better than the PCA method, and slightly better than the Delta approach.
引用
收藏
页码:132 / 161
页数:30
相关论文
共 50 条
  • [41] Corpora with Part-of-Speech Annotations for Three Regional Languages of France: Alsatian, Occitan and Picard
    Bernhard, Delphine
    Ligozat, Anne-Laure
    Martin, Fanny
    Bras, Myriam
    Magistry, Pierre
    Vergez-Couret, Marianne
    Steible, Lucie
    Erhart, Pascale
    Hathout, Nabil
    Huck, Dominique
    Rey, Christophe
    Reynes, Philippe
    Rosset, Sophie
    Sibille, Jean
    Lavergne, Thomas
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3917 - 3924
  • [42] FRACTIONAL COUNTS FOR AUTHORSHIP ATTRIBUTION - A NUMERICAL STUDY
    BURRELL, Q
    ROUSSEAU, R
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1995, 46 (02): : 97 - 102
  • [43] A histroy by three languages
    Chiellino, G
    AKZENTE-ZEITSCHRIFT FUR LITERATUR, 2005, 52 (05): : 410 - 413
  • [44] 'SPEAKING THREE LANGUAGES'
    JACKETTI, MT
    MUNDUS ARTIUM, 1984, 14 (02): : 48 - 49
  • [45] Influence of lexical, syntactic and structural features and their combination on Authorship Attribution for Telugu Text
    NagaPrasad, S.
    Narsimha, V. B.
    Reddy, P. Vijayapal
    Babu, A. Vinaya
    INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND CONVERGENCE (ICCC 2015), 2015, 48 : 58 - 64
  • [46] A language-independent authorship attribution approach for author identification of text documents
    Ramezani, Reza
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 180
  • [47] A Comparative Study of Language Modeling to Instance-Based Methods, and Feature Combinations for Authorship Attribution
    Fourkioti, Olga
    Symeonidis, Symeon
    Arampatzis, Avi
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES (TPDL 2017), 2017, 10450 : 274 - 286
  • [48] A comparative study on authorship attribution classification tasks using both neural network and statistical methods
    Tsimboukakis, Nikos
    Tambouratzis, George
    NEURAL COMPUTING & APPLICATIONS, 2010, 19 (04): : 573 - 582
  • [49] Authorship Attribution on Kannada Text using Bi-Directional LSTM Technique
    Chandrika, C. P.
    Kallimani, Jagadish S.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (09) : 963 - 971
  • [50] A comparative study on authorship attribution classification tasks using both neural network and statistical methods
    Nikos Tsimboukakis
    George Tambouratzis
    Neural Computing and Applications, 2010, 19 : 573 - 582