Time-dependent Poisson reduced rank models for political text data analysis

被引:5
|
作者
Jentsch, Carsten [1 ]
Lee, Eun Ryung [2 ]
Mammen, Enno [3 ]
机构
[1] TU Dortmund, Fak Stat, D-44221 Dortmund, Germany
[2] Sungkyunkwan Univ, Dept Stat, 25-2 Sungkyunkwan Ro, Seoul 03063, South Korea
[3] Heidelberg Univ, Inst Appl Math, Im Neuenheimer Feld 205, D-69120 Heidelberg, Germany
基金
新加坡国家研究基金会;
关键词
Party manifestos; Text data; Term document matrices; Count data; High-dimensional data; Political spectrum; Political lexicon; Wordfish; INAR time series models; Penalization; LASSO; Fused LASSO; Dimension reduction; ROBUST TRANSFORMATION PROCEDURE; CROSS-CLASSIFICATIONS; PARTY POSITIONS; ASSOCIATION; SERIES; SELECTION; PITFALLS; WORDS; ERROR; LASSO;
D O I
10.1016/j.csda.2019.106813
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We consider Poisson reduced rank models where parameters depend on time. Our main motivation comes from studies in comparative politics where one wants to locate party positions in a certain political space. For this purpose, several empirical methods have been proposed using text as data sources. As the data structure of texts is quite complex, its analysis to extract information is generally a difficult task. Furthermore, political texts usually contain a large number of words such that a simultaneous analysis of word counts becomes challenging. In this paper, we consider Poisson models for each word count simultaneously and provide a statistical method suitable to analyze political text data. We consider a novel model which allows the political lexicon to change over time and develop an estimation procedure based on LASSO and fused LASSO penalization techniques to address high-dimensionality via significant dimension reduction. This model gives insights into the potentially changing use of words by left and right-wing parties over time. The procedure allows to identify automatically those words having a discriminating effect between party positions. To address the dependence structure of word counts over time, we propose integer-valued time series processes to implement a suitable bootstrap method for constructing confidence intervals for the model parameters. We apply our approach to party manifesto data from German parties over seven federal elections after German reunification. The approach does not require any a priori information nor expert knowledge to process the data. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 50 条