Utilizing Latent Posting Style for Authorship Attribution on Short Texts

被引:2
|
作者
Leepaisomboon, Patamawadee [1 ]
Iwaihara, Mizuho [1 ]
机构
[1] Waseda Univ, Grad Sch Informat Prod & Syst, Fukuoka, Japan
关键词
Latent Dirichlet allocation; authorship attribution; sentiment; short text; twitter; support vector machine; social network;
D O I
10.1109/DASC/PiCom/CBDCom/CyberSciTech.2019.00184
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Character n-grams and word n-grams are the most widely used features for authorship attribution on short texts. In this paper, we propose a new method which exploits latent posting styles estimated from authors' short texts. The new posting style features characterize each user's posting style through sentiment orientation and post length. Concise hidden posting styles are captured by Latent Dirichlet Allocation (LDA), where we consider two types of LDA models. Then the vectors of latent posting styles are concatenated with averaged word embeddings of character n-grams and word n-grams, to be used to train a support vector machine. Our results show that combining latent posting styles with the traditional features can improve the accuracy of authorship attribution up to 5.2%.
引用
收藏
页码:1015 / 1022
页数:8
相关论文
共 50 条
  • [31] Using Word Embeddings for Computing Distances Between Texts and for Authorship Attribution
    Hoenen, Armin
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2017, 2017, 10260 : 274 - 277
  • [32] Automatic Authorship Attribution for Texts in Croatian Language Using Combinations of Features
    Reicher, Tomislav
    Kristo, Ivan
    Belsa, Igor
    Silic, Artur
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT II, 2010, 6277 : 21 - 30
  • [33] Authorship attribution and feature testing for short Chinese emails
    Zhang, Shaomin
    INTERNATIONAL JOURNAL OF SPEECH LANGUAGE AND THE LAW, 2016, 23 (01) : 71 - 97
  • [34] Complete syntactic N-grams as style markers for authorship attribution
    Posadas-Duran, Juan-Pablo
    Sidorov, Grigori
    Batyrshin, Ildar
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, 8856 : 9 - 17
  • [35] Complete Syntactic N-grams as Style Markers for Authorship Attribution
    Posadas-Duran, Juan-Pablo
    Sidorov, Grigori
    Batyrshin, Ildar
    HUMAN-INSPIRED COMPUTING AND ITS APPLICATIONS, PT I, 2014, 8856 : 9 - 17
  • [36] PhotoStyle60: A Photographic Style Dataset for Photo Authorship Attribution and Photographic Style Transfer
    Cotogni, Marco
    Arazzi, Marco
    Cusano, Claudio
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10573 - 10584
  • [37] Towards Authorship Attribution in Arabic Short-Microblog Text
    Jambi, Kamal Mansour
    Khan, Imtiaz Hussain
    Siddiqui, Muazzam Ahmed
    Alhaj, Salma Omar
    IEEE ACCESS, 2021, 9 : 128506 - 128520
  • [38] Time-Aware Authorship Attribution for Short Text Streams
    Azarbonyad, Hosein
    Dehghani, Mostafa
    Marx, Maarten
    Kamps, Jaap
    SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 727 - 730
  • [39] Authorship Attribution in Huayan Texts by Machine Learning using N-gram and SVM
    Park, Boram
    INTERNATIONAL JOURNAL OF BUDDHIST THOUGHT & CULTURE, 2018, 28 (02): : 69 - 86
  • [40] Authorship attribution, idiolectal style, and online identity : A specialised corpus of Najdi Arabic Tweets
    AlAmr, Mashael
    INTERNATIONAL JOURNAL OF SPEECH LANGUAGE AND THE LAW, 2024, 31 (01) : 154 - 161