Impact of Character n-grams Attention Scores for English and Russian News Articles Authorship Attribution

被引:0
|
作者
Makhmutova, Liliya [1 ]
Ross, Robert [1 ]
Salton, Giancarlo [2 ]
机构
[1] Technol Univ Dublin, Dublin, Ireland
[2] Univ Comunitaria Regiao de Chapeco, Unochapeco, Chapeco, SC, Brazil
基金
爱尔兰科学基金会;
关键词
Character n-grams; Authorship Attribution task; attention score;
D O I
10.1145/3555776.3577856
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Language embeddings are often used as black-box word-level tools that provide powerful language analysis across many tasks, but yet for many tasks such as Authorship Attribution access to feature level information on character n-grams can provide insights to help with model refinement and development. In this paper we investigate and evaluate the importance of character n-grams within an embeddings context in authorship attribution through the use of attention scores. We perform this investigation both for English (Reuters_50_50) and Russian (Taiga) news authorship datasets. Our analysis show that character n-grams attention score is higher for n-grams that are considered to be important for authorship identification for humans. Beyond specific benefits in authorship attribution, this work provides insights into the importance of character n-grams as a unit within embeddings.
引用
收藏
页码:939 / 941
页数:3
相关论文
共 11 条
  • [1] Authorship Attribution in Portuguese Using Character N-grams
    Markov, Ilia
    Baptista, Jorge
    Pichardo-Lagunas, Obdulia
    [J]. ACTA POLYTECHNICA HUNGARICA, 2017, 14 (03) : 59 - 78
  • [2] An improved N-grams based Model for Authorship Attribution
    Boughaci, Dalila
    Benmesbah, Mounir
    Zebiri, Aniss
    [J]. 2019 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCIS), 2019, : 70 - 75
  • [3] Authorship Attribution of Ancient Texts Written by Ten Arabic Travelers Using Character N-Grams
    Ouamour, Siham
    Sayoud, Halim
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (CITS), 2013,
  • [4] Complete Syntactic N-grams as Style Markers for Authorship Attribution
    Posadas-Duran, Juan-Pablo
    Sidorov, Grigori
    Batyrshin, Ildar
    [J]. HUMAN-INSPIRED COMPUTING AND ITS APPLICATIONS, PT I, 2014, 8856 : 9 - 17
  • [5] Instance Based Authorship Attribution for Kannada Text Using Amalgamation of Character and Word N-grams Technique
    Chandrika, C. P.
    Kallimani, Jagadish S.
    [J]. DISTRIBUTED COMPUTING AND OPTIMIZATION TECHNIQUES, ICDCOT 2021, 2022, 903 : 547 - 557
  • [6] Authorship attribution of Spanish poems using n-grams and the Web as Corpus
    Guzman-Cabrera, Rafael
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2391 - 2396
  • [7] Using character N-grams to explore diachronic change in medieval English
    Buckley, Kevin
    Vogel, Carl
    [J]. FOLIA LINGUISTICA, 2019, 53 : 249 - 299
  • [8] Document embeddings learned on various types of n-grams for cross-topic authorship attribution
    Gomez-Adorno, Helena
    Posadas-Duran, Juan-Pablo
    Sidorov, Grigori
    Pinto, David
    [J]. COMPUTING, 2018, 100 (07) : 741 - 756
  • [9] Document embeddings learned on various types of n-grams for cross-topic authorship attribution
    Helena Gómez-Adorno
    Juan-Pablo Posadas-Durán
    Grigori Sidorov
    David Pinto
    [J]. Computing, 2018, 100 : 741 - 756