Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data

被引:120
|
作者
Benoit, Kenneth [1 ,2 ]
Conway, Drew [3 ]
Lauderdale, Benjamin E. [4 ]
Laver, Michael [3 ]
Mikhaylov, Slava [5 ]
机构
[1] London Sch Econ, London, England
[2] Trinity Coll Dublin, Dublin, Ireland
[3] NYU, New York, NY 10003 USA
[4] London Sch Econ & Polit Sci, London, England
[5] UCL, London WC1E 6BT, England
基金
欧洲研究理事会;
关键词
PARTY; RELIABILITY;
D O I
10.1017/S0003055416000058
中图分类号
D0 [政治学、政治理论];
学科分类号
0302 ; 030201 ;
摘要
Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers' attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.
引用
收藏
页码:278 / 295
页数:18
相关论文
共 50 条
  • [1] Reproducible biomedical benchmarking in the cloud: lessons from crowd-sourced data challenges
    Ellrott, Kyle
    Buchanan, Alex
    Creason, Allison
    Mason, Michael
    Schaffter, Thomas
    Hoff, Bruce
    Eddy, James
    Chilton, John M.
    Yu, Thomas
    Stuart, Joshua M.
    Saez-Rodriguez, Julio
    Stolovitzky, Gustavo
    Boutros, Paul C.
    Guinney, Justin
    GENOME BIOLOGY, 2019, 20 (01)
  • [2] Reproducible biomedical benchmarking in the cloud: lessons from crowd-sourced data challenges
    Kyle Ellrott
    Alex Buchanan
    Allison Creason
    Michael Mason
    Thomas Schaffter
    Bruce Hoff
    James Eddy
    John M. Chilton
    Thomas Yu
    Joshua M. Stuart
    Julio Saez-Rodriguez
    Gustavo Stolovitzky
    Paul C. Boutros
    Justin Guinney
    Genome Biology, 20
  • [3] Modeling Preconditions in Text with a Crowd-sourced Dataset
    Kwon, Heeyoung
    Koupaee, Mahnaz
    Singh, Pratyush
    Sawhney, Gargi
    Shukla, Anmol
    Kallur, Keerthi Kumar
    Chambers, Nathanael
    Balasubramanian, Niranjan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 3818 - 3828
  • [4] Crowd-sourced soil data for Europe
    Shelley, Wayne
    Lawley, Russell
    Robinson, David A.
    NATURE, 2013, 496 (7445) : 300 - 300
  • [5] Editorial Bias in Crowd-Sourced Political Information
    Kalla, Joshua L.
    Aronow, Peter M.
    PLOS ONE, 2015, 10 (09):
  • [6] Crowd-sourced soil data for Europe
    Wayne Shelley
    Russell Lawley
    David A. Robinson
    Nature, 2013, 496 : 300 - 300
  • [7] HETEROGENEOUS CROWD-SOURCED DATA ANALYTICS
    Barhamgi, Mahmoud
    Zhou, Zhangbing
    Chen, Chao
    Thill, Jean-Claude
    IEEE ACCESS, 2017, 5 : 27807 - 27809
  • [8] CDME - Crowd-Sourced Data Mapping Engine System that Analyzes, Mapps & Publishes Crowd-Sourced Data on Enviorenment Facts
    Ruwanpathirana, S.
    Perera, I.
    2015 Moratuwa Engineering Research Conference (MERCon), 2015, : 271 - 276
  • [9] Gluten Contamination of Restaurant Food: Analysis of Crowd-Sourced Data
    Lerner, Benjamin A.
    Lynn Phan Vo
    Yates, Shireen
    Rundle, Andrew G.
    Green, Peter H. R.
    Lebwohl, Benjamin
    AMERICAN JOURNAL OF GASTROENTEROLOGY, 2018, 113 : S658 - S658
  • [10] Prediction and Analysis of Hotel Ratings from Crowd-Sourced Data
    Leal, Fatima
    Malheiro, Benedita
    Carlos Burguillo, Juan
    RECENT ADVANCES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 2, 2017, 570 : 493 - 502