An improved N-grams based Model for Authorship Attribution

被引:1
|
作者
Boughaci, Dalila [1 ]
Benmesbah, Mounir [1 ]
Zebiri, Aniss [1 ]
机构
[1] Univ Sci & Technol Houari Boumediene, Dept Comp Sci, BP 32 El Alia, Algiers 16111, Algeria
关键词
Authorship attribution; N-gram; similarity functions; Euclidian distance; text classification; IDENTIFICATION;
D O I
10.1109/iccisci.2019.8716391
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Authorship attribution is the problem of studying an anonymous text and finding the corresponding author in a set of candidate authors. In this paper, we propose a method based on N-grams model for the problem of authorship attribution. Several measures are used to assign an anonymous text to an author. The different variants of the proposed method are implemented and validated on PAN benchmarks. The numerical results are encouraging and demonstrate the benefit of the proposed idea.
引用
收藏
页码:70 / 75
页数:6
相关论文
共 50 条
  • [1] Authorship Attribution in Portuguese Using Character N-grams
    Markov, Ilia
    Baptista, Jorge
    Pichardo-Lagunas, Obdulia
    [J]. ACTA POLYTECHNICA HUNGARICA, 2017, 14 (03) : 59 - 78
  • [2] Complete Syntactic N-grams as Style Markers for Authorship Attribution
    Posadas-Duran, Juan-Pablo
    Sidorov, Grigori
    Batyrshin, Ildar
    [J]. HUMAN-INSPIRED COMPUTING AND ITS APPLICATIONS, PT I, 2014, 8856 : 9 - 17
  • [3] Authorship attribution of Spanish poems using n-grams and the Web as Corpus
    Guzman-Cabrera, Rafael
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2391 - 2396
  • [4] Instance Based Authorship Attribution for Kannada Text Using Amalgamation of Character and Word N-grams Technique
    Chandrika, C. P.
    Kallimani, Jagadish S.
    [J]. DISTRIBUTED COMPUTING AND OPTIMIZATION TECHNIQUES, ICDCOT 2021, 2022, 903 : 547 - 557
  • [5] Document embeddings learned on various types of n-grams for cross-topic authorship attribution
    Gomez-Adorno, Helena
    Posadas-Duran, Juan-Pablo
    Sidorov, Grigori
    Pinto, David
    [J]. COMPUTING, 2018, 100 (07) : 741 - 756
  • [6] Authorship Attribution of Ancient Texts Written by Ten Arabic Travelers Using Character N-Grams
    Ouamour, Siham
    Sayoud, Halim
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (CITS), 2013,
  • [7] Document embeddings learned on various types of n-grams for cross-topic authorship attribution
    Helena Gómez-Adorno
    Juan-Pablo Posadas-Durán
    Grigori Sidorov
    David Pinto
    [J]. Computing, 2018, 100 : 741 - 756
  • [9] Impact of Character n-grams Attention Scores for English and Russian News Articles Authorship Attribution
    Makhmutova, Liliya
    Ross, Robert
    Salton, Giancarlo
    [J]. 38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023, 2023, : 939 - 941
  • [10] Interpolated N-Grams for Model Based Testing
    Tonella, Paolo
    Tiella, Roberto
    Cu Duy Nguyen
    [J]. 36TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2014), 2014, : 562 - 572