Unifying Lexical, Syntactic, and Structural Representations of Written Language for Authorship Attribution

被引:0
|
作者
Jafariakinabad F. [1 ]
Hua K.A. [1 ]
机构
[1] University of Central Florida, Orlando, FL
关键词
Authorship attribution; Deep neural networks; Document analysis; Natural language processing; Syntax encoding;
D O I
10.1007/s42979-021-00911-2
中图分类号
学科分类号
摘要
Writing style in written language is a combination of consistent decisions associated with a specific author at different levels of language production, including lexical, syntactic, and structural. The recent work in neural network based style analysis mainly lacks the multi-level modeling of writing style. In this paper, we introduce a style-aware neural model to encode document information from three stylistic levels and evaluate it in the domain of authorship attribution. First, we propose a simple way to jointly encode syntactic and lexical representations of sentences. Subsequently, we employ an attention-based hierarchical neural network to encode the syntactic and semantic structure of sentences in documents while rewarding the sentences which contribute more in capturing the writing style. Our experimental results, based on four benchmark datasets, reveal the benefits of encoding document information from all three stylistic levels when compared to the baseline methods in the literature. Additionally, We adopt a transfer learning approach and use deep contextualized word representation (ELMo) in our model to measure the impact of lower level linguistic representations versus higher level linguistic representations of ELMo in the task of authorship attribution. According to our experimental results, lower level linguistic representations which mainly carry syntactic information demonstrate better performance in authorship attribution task when compared to higher level linguistic representations which mainly carry semantic information. © 2021, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 50 条
  • [21] Complete Syntactic N-grams as Style Markers for Authorship Attribution
    Posadas-Duran, Juan-Pablo
    Sidorov, Grigori
    Batyrshin, Ildar
    HUMAN-INSPIRED COMPUTING AND ITS APPLICATIONS, PT I, 2014, 8856 : 9 - 17
  • [22] Lexical and syntactic target language interactions in translation
    Omar Ruiz, Jason
    Macizo, Pedro
    ACTA PSYCHOLOGICA, 2019, 199
  • [23] A GRAMMAR DESCRIPTION LANGUAGE FOR LEXICAL AND SYNTACTIC PARSERS
    GENILLARD, C
    STROHMEIER, A
    SIGPLAN NOTICES, 1988, 23 (10): : 103 - 122
  • [24] Authorship Attribution of Short Historical Arabic Texts Based on Lexical Features
    Ouamour, S.
    Sayoud, H.
    2013 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC), 2013, : 144 - 147
  • [25] SYNTACTIC STRUCTURES IN WRITTEN LANGUAGE OF DEAF CHILDREN
    WILBUR, RB
    QUIGLEY, SP
    VOLTA REVIEW, 1975, 77 (03) : 194 - 203
  • [26] Authorship Attribution of Small Messages Through Language Models
    Theophilo, Antonio
    Rocha, Anderson
    2022 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS), 2022,
  • [27] Language and Obfuscation Oblivious Source Code Authorship Attribution
    Zafar, Sarim
    Sarwar, Muhammad Usman
    Salem, Saeed
    Malik, Muhammad Zubair
    IEEE ACCESS, 2020, 8 (08): : 197581 - 197596
  • [28] A Comparison of Authorship Attribution Approaches Applied on the Lithuanian Language
    Kapociute-Dzikiene, Jurgita
    Venckauskas, Algimantas
    Damasevicius, Robertas
    PROCEEDINGS OF THE 2017 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2017, : 347 - 351
  • [29] Language independent authorship attribution using character level language models
    Peng, FC
    Schuurmans, D
    Keselj, V
    Wang, SJ
    EACL 2003: 10TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, : 267 - 274
  • [30] Authorship Attribution for a Resource Poor Language-Urdu
    Nazir, Zulqarnain
    Shahzad, Khurram
    Malik, Muhammad Kamran
    Anwar, Waheed
    Bajwa, Imran Sarwar
    Mehmood, Khawar
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (03)