Stylometric analyses using Dirichlet process mixture models

被引:2
|
作者
Gill, Paramjit S. [1 ]
Swartz, Tim B. [2 ]
机构
[1] Univ British Columbia Okanagan, IK Barber Sch Arts & Sci, Kelowna, BC V1V 1V7, Canada
[2] Simon Fraser Univ, Dept Stat & Actuarial Sci, Burnaby, BC V5A 1S6, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Bayesian methods; Clustering; Computational linguistics; Dirichlet process priors; Disputed authorship; Federalist papers; Multinomial distribution; INFERENCE;
D O I
10.1016/j.jspi.2011.05.020
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Stylometry refers to the statistical analysis of literary style of authors based on the characteristics of expression in their writings. We propose an approach to stylometry based on a Bayesian Dirichlet process mixture model using multinomial word frequency data. The parameters of the multinomial distribution of word frequency data are the "word prints" of the author. Our approach is based on model-based clustering of the vectors of probability values of the multinomial distribution. The resultant clusters identify different writing styles that assist in author attribution for disputed works in a corpus. As a test case, the methodology is applied to the problem of authorship attribution involving the Federalist papers. Our results are consistent with previous stylometric analyses of these papers. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:3665 / 3674
页数:10
相关论文
共 50 条
  • [1] Estimating mixture of Dirichlet process models
    MacEachern, SN
    Muller, P
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 1998, 7 (02) : 223 - 238
  • [2] CLASSIFICATION OF MULTIVARIATE DATA USING DIRICHLET PROCESS MIXTURE MODELS
    Djuric, Petar M.
    Ferrari, Andre
    [J]. 2012 CONFERENCE RECORD OF THE FORTY SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS (ASILOMAR), 2012, : 441 - 445
  • [3] Dirichlet process mixture models with shrinkage prior
    Ding, Dawei
    Karabatsos, George
    [J]. STAT, 2021, 10 (01):
  • [4] Distributed Inference for Dirichlet Process Mixture Models
    Ge, Hong
    Chen, Yutian
    Wan, Moquan
    Ghahramani, Zoubin
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 2276 - 2284
  • [5] DIRICHLET PROCESS MIXTURE MODELS WITH MULTIPLE MODALITIES
    Paisley, John
    Carin, Lawrence
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 1613 - 1616
  • [6] Distributed MCMC Inference in Dirichlet Process Mixture Models Using Julia
    Dinari, Or
    Yu, Angel
    Freifeld, Oren
    Fisher, John W., III
    [J]. 2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2019, : 518 - 525
  • [7] Background Subtraction with Dirichlet Process Mixture Models
    Haines, Tom S. F.
    Xiang, Tao
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (04) : 670 - 683
  • [8] Collapsed Variational Dirichlet Process Mixture Models
    Kurihara, Kenichi
    Welling, Max
    Teh, Yee Whye
    [J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2796 - 2801
  • [9] Online damage detection of cutting tools using Dirichlet process mixture models?
    Wickramarachchi, Chandula T.
    Rogers, Timothy J.
    McLeay, Thomas E.
    Leahy, Wayne
    Cross, Elizabeth J.
    [J]. MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2022, 180
  • [10] Fast Bayesian Inference in Dirichlet Process Mixture Models
    Wang, Lianming
    Dunson, David B.
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2011, 20 (01) : 196 - 216