Cross-Domain Authorship Attribution Using Pre-trained Language Models

被引:11
|
作者
Barlas, Georgios [1 ]
Stamatatos, Efstathios [1 ]
机构
[1] Univ Aegean, Karlovassi 83200, Greece
关键词
Authorship Attribution; Neural network language models; Pre-trained language models;
D O I
10.1007/978-3-030-49161-1_22
中图分类号
学科分类号
摘要
Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics. An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre. In this paper, we modify a successful authorship verification approach based on a multi-headed neural network language model and combine it with pre-trained language models. Based on experiments on a controlled corpus covering several text genres where topic and genre is specifically controlled, we demonstrate that the proposed approach achieves very promising results. We also demonstrate the crucial effect of the normalization corpus in cross-domain attribution.
引用
收藏
页码:255 / 266
页数:12
相关论文
共 50 条
  • [1] An Ensemble Approach to Cross-Domain Authorship Attribution
    Custodio, Jose Eleandro
    Paraboni, Ivandre
    [J]. EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION (CLEF 2019), 2019, 11696 : 201 - 212
  • [2] Pre-trained Language Models in Biomedical Domain: A Systematic Survey
    Wang, Benyou
    Xie, Qianqian
    Pei, Jiahuan
    Chen, Zhihong
    Tiwari, Prayag
    Li, Zhao
    Fu, Jie
    [J]. ACM COMPUTING SURVEYS, 2024, 56 (03)
  • [3] Emotional Paraphrasing Using Pre-trained Language Models
    Casas, Jacky
    Torche, Samuel
    Daher, Karl
    Mugellini, Elena
    Abou Khaled, Omar
    [J]. 2021 9TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2021,
  • [4] A transfer learning approach to cross-domain authorship attribution
    Barlas, Georgios
    Stamatatos, Efstathios
    [J]. EVOLVING SYSTEMS, 2021, 12 (03) : 625 - 643
  • [5] A transfer learning approach to cross-domain authorship attribution
    Georgios Barlas
    Efstathios Stamatatos
    [J]. Evolving Systems, 2021, 12 : 625 - 643
  • [6] Pre-Trained Language Models and Their Applications
    Wang, Haifeng
    Li, Jiwei
    Wu, Hua
    Hovy, Eduard
    Sun, Yu
    [J]. ENGINEERING, 2023, 25 : 51 - 65
  • [7] Pre-trained language models with domain knowledge for biomedical extractive summarization
    Xie, Qianqian
    Bishop, Jennifer Amy
    Tiwari, Prayag
    Ananiadou, Sophia
    [J]. Knowledge-Based Systems, 2022, 252
  • [8] μBERT: Mutation Testing using Pre-Trained Language Models
    Degiovanni, Renzo
    Papadakis, Mike
    [J]. 2022 IEEE 15TH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS (ICSTW 2022), 2022, : 160 - 169
  • [9] Devulgarization of Polish Texts Using Pre-trained Language Models
    Klamra, Cezary
    Wojdyga, Grzegorz
    Zurowski, Sebastian
    Rosalska, Paulina
    Kozlowska, Matylda
    Ogrodniczuk, Maciej
    [J]. COMPUTATIONAL SCIENCE, ICCS 2022, PT II, 2022, : 49 - 55
  • [10] Identifying Styles of Cross-Language Classics with Pre-Trained Models
    Zhang, Yiqin
    Deng, Sanhong
    Hu, Haotian
    Wang, Dongbo
    [J]. Data Analysis and Knowledge Discovery, 2023, 7 (10) : 50 - 62