Author Identification Using Latent Dirichlet Allocation

被引:0
|
作者
Calvo, Hiram [1 ,2 ]
Hernandez-Castaneda, Angel [1 ]
Garcia-Flores, Jorge [2 ]
机构
[1] IPN, Ctr Comp Res CIC, Ave JD Batiz E MO Mendizabal, Mexico City 07738, DF, Mexico
[2] Univ Paris 13, Lab Informat Paris Nord, CNRS, UMR 7030,Sorbonne Paris Cite, F-93430 Villetaneuse, France
关键词
D O I
10.1007/978-3-319-77116-8_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We tackle the task of author identification at PAN 2015 through a Latent Dirichlet Allocation (LDA) model. By using this method, we take into account the vocabulary and context of words at the same time, and after a statistical process find to what extent the relations between words are given in each document; processing a set of documents by LDA returns a set of distributions of topics. Each distribution can be seen as a vector of features and a fingerprint of each document within the collection. We used then a Naive Bayes classifier on the obtained patterns with different performances. We obtained state-of-the-art performance for English, overtaking the best FS score reported in PAN 2015, while obtaining mixed results for other languages.
引用
收藏
页码:303 / 312
页数:10
相关论文
共 50 条
  • [31] Identifying Top Listers in Alphabay Using Latent Dirichlet Allocation
    Grisham, John
    Barreras, Calvin
    Afarin, Cyran
    Patton, Mark
    Chen, Hsinchun
    IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS: CYBERSECURITY AND BIG DATA, 2016, : 219 - 219
  • [32] Obtaining Single Document Summaries Using Latent Dirichlet Allocation
    Nagesh, Karthik
    Murty, M. Narasimha
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT IV, 2012, 7666 : 66 - 74
  • [33] Feature Substitution Using Latent Dirichlet Allocation for Text Classification
    Mathivanan, Norsyela Muhammad Noor
    Janor, Roziah Mohd
    Abd Razak, Shukor
    Ghani, Nor Azura Md.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2025, 16 (01) : 1087 - 1098
  • [34] Distributed Latent Dirichlet Allocation on Streams
    Guo, Yunyan
    Li, Jianzhong
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2022, 16 (01)
  • [35] Parallel Latent Dirichlet Allocation on GPUs
    Moon, Gordon E.
    Nisa, Israt
    Sukumaran-Rajam, Aravind
    Bandyopadhyay, Bortik
    Parthasarathy, Srinivasan
    Sadayappan, P.
    COMPUTATIONAL SCIENCE - ICCS 2018, PT II, 2018, 10861 : 259 - 272
  • [36] Selecting Priors for Latent Dirichlet Allocation
    Syed, Shaheen
    Spruit, Marco
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 194 - 202
  • [37] Crowd labeling latent Dirichlet allocation
    Luca Pion-Tonachini
    Scott Makeig
    Ken Kreutz-Delgado
    Knowledge and Information Systems, 2017, 53 : 749 - 765
  • [38] Latent IBP Compound Dirichlet Allocation
    Archambeau, Cedric
    Lakshminarayanan, Balaji
    Bouchard, Guillaume
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (02) : 321 - 333
  • [39] Slow mixing for Latent Dirichlet Allocation
    Jonasson, Johan
    STATISTICS & PROBABILITY LETTERS, 2017, 129 : 96 - 100
  • [40] INFERENCE IN SUPERVISED LATENT DIRICHLET ALLOCATION
    Lakshminarayanan, Balaji
    Raich, Raviv
    2011 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2011,