Unifying Lexical, Syntactic, and Structural Representations of Written Language for Authorship Attribution

被引:0
|
作者
Jafariakinabad F. [1 ]
Hua K.A. [1 ]
机构
[1] University of Central Florida, Orlando, FL
关键词
Authorship attribution; Deep neural networks; Document analysis; Natural language processing; Syntax encoding;
D O I
10.1007/s42979-021-00911-2
中图分类号
学科分类号
摘要
Writing style in written language is a combination of consistent decisions associated with a specific author at different levels of language production, including lexical, syntactic, and structural. The recent work in neural network based style analysis mainly lacks the multi-level modeling of writing style. In this paper, we introduce a style-aware neural model to encode document information from three stylistic levels and evaluate it in the domain of authorship attribution. First, we propose a simple way to jointly encode syntactic and lexical representations of sentences. Subsequently, we employ an attention-based hierarchical neural network to encode the syntactic and semantic structure of sentences in documents while rewarding the sentences which contribute more in capturing the writing style. Our experimental results, based on four benchmark datasets, reveal the benefits of encoding document information from all three stylistic levels when compared to the baseline methods in the literature. Additionally, We adopt a transfer learning approach and use deep contextualized word representation (ELMo) in our model to measure the impact of lower level linguistic representations versus higher level linguistic representations of ELMo in the task of authorship attribution. According to our experimental results, lower level linguistic representations which mainly carry syntactic information demonstrate better performance in authorship attribution task when compared to higher level linguistic representations which mainly carry semantic information. © 2021, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 50 条
  • [1] Influence of lexical, syntactic and structural features and their combination on Authorship Attribution for Telugu Text
    NagaPrasad, S.
    Narsimha, V. B.
    Reddy, P. Vijayapal
    Babu, A. Vinaya
    INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND CONVERGENCE (ICCC 2015), 2015, 48 : 58 - 64
  • [2] A Computational Approach Based on Syntactic Levels of Language in Authorship Attribution
    Varela, P. J.
    Justino, E. J. R.
    Bortolozzi, F.
    Oliveira, L. E. S.
    IEEE LATIN AMERICA TRANSACTIONS, 2016, 14 (01) : 259 - 266
  • [3] Authorship Attribution Using Syntactic Dependencies
    Soler-Company, Juan
    Wanner, Leo
    ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2016, 288 : 303 - 308
  • [4] Selecting Syntactic Attributes for Authorship Attribution
    Varela, Paulo
    Justino, Edson
    Oliveira, Luiz S.
    2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 167 - 172
  • [5] Exploring syntactic and semantic features for authorship attribution
    Wu, Haiyan
    Zhang, Zhiqiang
    Wu, Qingfeng
    APPLIED SOFT COMPUTING, 2021, 111 (111)
  • [6] Syntactic methods for topic-independent authorship attribution
    Bjorklund, Johanna
    Zechner, Niklas
    NATURAL LANGUAGE ENGINEERING, 2017, 23 (05) : 789 - 806
  • [7] Using Lexical Stress in Authorship Attribution of Historical Texts
    Ivanov, Lubomir
    Petrovic, Smiljana
    TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 105 - 113
  • [8] Language models and fusion for authorship attribution
    Fourkioti, Olga
    Symeonidis, Symeon
    Arampatzis, Avi
    INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (06)
  • [9] Cross-Language Authorship Attribution
    Bogdanova, Dasha
    Lazaridou, Angeliki
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2015 - 2020
  • [10] Distributed language representation for authorship attribution
    Kocher, Mirco
    Savoy, Jacques
    DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2018, 33 (02) : 425 - 441