Utilizing Latent Posting Style for Authorship Attribution on Short Texts

被引:2
|
作者
Leepaisomboon, Patamawadee [1 ]
Iwaihara, Mizuho [1 ]
机构
[1] Waseda Univ, Grad Sch Informat Prod & Syst, Fukuoka, Japan
关键词
Latent Dirichlet allocation; authorship attribution; sentiment; short text; twitter; support vector machine; social network;
D O I
10.1109/DASC/PiCom/CBDCom/CyberSciTech.2019.00184
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Character n-grams and word n-grams are the most widely used features for authorship attribution on short texts. In this paper, we propose a new method which exploits latent posting styles estimated from authors' short texts. The new posting style features characterize each user's posting style through sentiment orientation and post length. Concise hidden posting styles are captured by Latent Dirichlet Allocation (LDA), where we consider two types of LDA models. Then the vectors of latent posting styles are concatenated with averaged word embeddings of character n-grams and word n-grams, to be used to train a support vector machine. Our results show that combining latent posting styles with the traditional features can improve the accuracy of authorship attribution up to 5.2%.
引用
收藏
页码:1015 / 1022
页数:8
相关论文
共 50 条
  • [41] RoPGen: Towards Robust Code Authorship Attribution via Automatic Coding Style Transformation
    Li, Zhen
    Chen, Guenevere
    Chen, Chen
    Zou, Yayi
    Xu, Shouhuai
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 1906 - 1918
  • [42] State of the Art in Authorship Attribution With Impact Analysis of Stylometric Features on Style Breach Prediction
    Prasad, Rajesh Shardanand
    Chakkaravarthy, Midhun
    JOURNAL OF CASES ON INFORMATION TECHNOLOGY, 2022, 24 (04)
  • [43] Universal Dependencies and Author Attribution of Short Texts with Syntax Alone
    Gorman, Robert
    DIGITAL HUMANITIES QUARTERLY, 2022, 16 (02):
  • [44] Collaboratively Modeling and Embedding of Latent Topics for Short Texts
    Liu, Zheng
    Qin, Tingting
    Chen, Ke-Jia
    Li, Yun
    IEEE ACCESS, 2020, 8 : 99141 - 99153
  • [45] DISTILLED AND FUSED STYLE EMBEDDING FOR AUTHORSHIP ATTRIBUTION WITH WEIGHTED MULTI-KERNEL ENSEMBLE MODEL
    Lavanya, B.
    Sowmiya, R.
    INTERNATIONAL JOURNAL ON INFORMATION TECHNOLOGIES AND SECURITY, 2024, 16 (04): : 25 - 36
  • [46] Authorship Attribution of Ancient Texts Written by Ten Arabic Travelers Using Character N-Grams
    Ouamour, Siham
    Sayoud, Halim
    2013 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (CITS), 2013,
  • [47] Between Authorship and Oral Transmission: Negotiating the Attribution of Authorial, Oral and Collective Style Markers in Early Modern Playtexts
    Petersen, Lene Buhl
    JOURNAL OF EARLY MODERN STUDIES, 2016, 5 : 277 - 306
  • [48] Authorship and style. A Cervantes attribution from the digital humanities. The case of La conquista de Jerusalen
    Cerezo Soler, Juan
    Calvo Tello, Jose
    ANALES CERVANTINOS, 2019, 51 : 231 - 250
  • [49] Source Code Authorship Attribution Using Long Short-Term Memory Based Networks
    Alsulami, Bander
    Dauber, Edwin
    Harang, Richard
    Mancoridis, Spiros
    Greenstadt, Rachel
    COMPUTER SECURITY - ESORICS 2017, PT I, 2018, 10492 : 65 - 82
  • [50] A Comparison of Language Identification Approaches on Short, Query-Style Texts
    Gottron, Thomas
    Lipka, Nedim
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2010, 5993 : 611 - +