Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data

被引:120
|
作者
Benoit, Kenneth [1 ,2 ]
Conway, Drew [3 ]
Lauderdale, Benjamin E. [4 ]
Laver, Michael [3 ]
Mikhaylov, Slava [5 ]
机构
[1] London Sch Econ, London, England
[2] Trinity Coll Dublin, Dublin, Ireland
[3] NYU, New York, NY 10003 USA
[4] London Sch Econ & Polit Sci, London, England
[5] UCL, London WC1E 6BT, England
基金
欧洲研究理事会;
关键词
PARTY; RELIABILITY;
D O I
10.1017/S0003055416000058
中图分类号
D0 [政治学、政治理论];
学科分类号
0302 ; 030201 ;
摘要
Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers' attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.
引用
收藏
页码:278 / 295
页数:18
相关论文
共 50 条
  • [41] Predicting Venue Popularity Using Crowd-Sourced and Passive Sensor Data
    Timokhin, Stanislav
    Sadrani, Mohammad
    Antoniou, Constantinos
    SMART CITIES, 2020, 3 (03): : 818 - 841
  • [42] The GRAAL of carpooling: GReen And sociAL optimization from crowd-sourced data
    Berlingerio, Michele
    Ghaddar, Bissan
    Guidotti, Riccardo
    Pascale, Alessandra
    Sassi, Andrea
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2017, 80 : 20 - 36
  • [43] Designing Data Validation Framework for Crowd-Sourced Road Monitoring Applications
    Saha J.
    Roy S.
    Das T.K.
    Purkait K.
    Chowdhury C.
    Journal of The Institution of Engineers (India): Series B, 2022, 103 (04) : 1083 - 1096
  • [44] Building a crowd-sourced challenge using clinical trial data.
    Zhou, Fang Liz
    Guinney, Justin
    Abdallah, Kald
    Norman, Thea C.
    Bot, Brian
    Costello, James
    Shen, Liji
    Wang, Tao
    Xie, Yang
    Stolovitzky, Gustavo A.
    JOURNAL OF CLINICAL ONCOLOGY, 2015, 33 (15)
  • [45] Scenic travel route planning based on multi-sourced and heterogeneous crowd-sourced data
    Chen X.
    Chen C.
    Liu K.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2016, 50 (06): : 1183 - 1188
  • [46] Crowd-Sourced Data and Analysis Tools for Advancing the Chemical Vapor Deposition of Graphene: Implications for Manufacturing
    Schiller, Joshua A.
    Toro, Ricardo
    Shah, Aagam
    Surana, Mitisha
    Zhang, Kaihao
    Robertson, Matthew
    Miller, Kristina
    Cruse, Kevin
    Liu, Kevin
    Seong, Bomsaerah
    Seol, Chae
    Foster, Ian T.
    Blaiszik, Ben J.
    Galewsky, Ben
    Adams, Darren
    Katz, Daniel S.
    Ferreira, Placid
    Ertekin, Elif
    Tawfick, Sameh
    ACS APPLIED NANO MATERIALS, 2020, 3 (10) : 10144 - 10155
  • [47] Detection of Gluten in Gluten-Free Labeled Restaurant Food: Analysis of Crowd-Sourced Data
    Lerner, Benjamin A.
    Vo, Lynn T. Phan
    Yates, Shireen
    Rundle, Andrew G.
    Green, Peter H. R.
    Lebwohl, Benjamin
    AMERICAN JOURNAL OF GASTROENTEROLOGY, 2019, 114 (05): : 792 - 797
  • [48] ARService: A Smartphone based Crowd-Sourced Data Collection and Activity Recognition Framework
    Incel, Ozlem Durmaz
    Ozgovde, Atay
    9TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2018) / THE 8TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2018) / AFFILIATED WORKSHOPS, 2018, 130 : 1019 - 1024
  • [49] Learning of Performance Measures from Crowd-Sourced Data with Application to Ranking of Investments
    Harris, Greg
    Panangadan, Anand
    Prasanna, Viktor K.
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART I, 2015, 9077 : 538 - 549
  • [50] Crowd-sourced allergic rhinitis symptom data: The influence of environmental and demographic factors
    Silver, Jeremy D.
    Spriggs, Kymble
    Haberle, Simon
    Katelaris, Constance H.
    Newbigin, Edward J.
    Lampugnani, Edwin R.
    SCIENCE OF THE TOTAL ENVIRONMENT, 2020, 705