Stylometric analyses using Dirichlet process mixture models

被引:2
|
作者
Gill, Paramjit S. [1 ]
Swartz, Tim B. [2 ]
机构
[1] Univ British Columbia Okanagan, IK Barber Sch Arts & Sci, Kelowna, BC V1V 1V7, Canada
[2] Simon Fraser Univ, Dept Stat & Actuarial Sci, Burnaby, BC V5A 1S6, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Bayesian methods; Clustering; Computational linguistics; Dirichlet process priors; Disputed authorship; Federalist papers; Multinomial distribution; INFERENCE;
D O I
10.1016/j.jspi.2011.05.020
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Stylometry refers to the statistical analysis of literary style of authors based on the characteristics of expression in their writings. We propose an approach to stylometry based on a Bayesian Dirichlet process mixture model using multinomial word frequency data. The parameters of the multinomial distribution of word frequency data are the "word prints" of the author. Our approach is based on model-based clustering of the vectors of probability values of the multinomial distribution. The resultant clusters identify different writing styles that assist in author attribution for disputed works in a corpus. As a test case, the methodology is applied to the problem of authorship attribution involving the Federalist papers. Our results are consistent with previous stylometric analyses of these papers. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:3665 / 3674
页数:10
相关论文
共 50 条
  • [31] HYPERSPECTRAL IMAGE CLASSIFICATION BASED ON DIRICHLET PROCESS MIXTURE MODELS
    Wu, Hao
    Prasad, Saurabh
    Cui, Minshan
    Nam Tuan Nguyen
    Han, Zhu
    [J]. 2013 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2013, : 1043 - 1046
  • [32] DIRICHLET PROCESS MIXTURE MODELS FOR TIME-DEPENDENT CLUSTERING
    Yu, Kezi
    Djuric, Petar M.
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4383 - 4387
  • [33] A Sequential Algorithm for Fast Fitting of Dirichlet Process Mixture Models
    Zhang, Xiaole
    Nott, David J.
    Yau, Christopher
    Jasra, Ajay
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2014, 23 (04) : 1143 - 1162
  • [34] Marginal likelihood and Bayes factors for Dirichlet process mixture models
    Basu, S
    Chib, S
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (461) : 224 - 235
  • [35] Variable selection in clustering via Dirichlet process mixture models
    Kim, Sinae
    Tadesse, Mahlet G.
    Vannucci, Marina
    [J]. BIOMETRIKA, 2006, 93 (04) : 877 - 893
  • [36] A Predictive Study of Dirichlet Process Mixture Models for Curve Fitting
    Wade, Sara
    Walker, Stephen G.
    Petrone, Sonia
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2014, 41 (03) : 580 - 605
  • [37] Dirichlet Process Gaussian Mixture Models:Choice of the Base Distribution
    Dilan Grür
    Carl Edward Rasmussen
    [J]. Journal of Computer Science & Technology, 2010, 25 (04) : 653 - 664
  • [38] Hyperspectral Image Segmentation Using The Dirichlet Mixture Models
    Sigirci, Ibrahim Onur
    Bilgin, Gokhan
    [J]. 2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 983 - 986
  • [39] Graph Clustering Using Dirichlet Process Mixture Model
    Atastina, Imelda
    Sitohang, Benhard
    Putri, G. A. S.
    Moertini, Veronica S.
    [J]. PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE), 2017,
  • [40] Deep Clustering using Dirichlet Process Gaussian Mixture
    Lim, Kart-Leong
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,