Sweet tweets! Evaluating a new approach for probability-based sampling of Twitter

被引:3
|
作者
Buskirk, Trent D. [1 ]
Blakely, Brian P. [1 ]
Eck, Adam [1 ]
McGrath, Richard [1 ]
Singh, Ravinder [1 ]
Yu, Youzhi [1 ]
机构
[1] Bowling Green State Univ, Bowling Green, OH 43403 USA
关键词
Twitter; Probability sampling; Tweets; Social media; COVID-19; Big data; Survey research;
D O I
10.1140/epjds/s13688-022-00321-1
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
As survey costs continue to rise and response rates decline, researchers are seeking more cost-effective ways to collect, analyze and process social and public opinion data. These issues have created an opportunity and interest in expanding the fit-for-purpose paradigm to include alternate sources such as passively collected sensor data and social media data. However, methods for accessing, sourcing and sampling social media data are just now being developed. In fact, there has been a small but growing body of literature focusing on comparing different Twitter data access methods through either the elaborate firehose or the free Twitter search or streaming APIs. Missing from the literature is a good understanding of how to randomly sample Tweets to produce datasets that are representative of the daily discourse, especially within geographical regions of interest, without requiring a census of all Tweets. This understanding is necessary for producing quality estimates of public opinion from social media sources such as Twitter. To address this gap, we propose and test the Velocity-Based Estimation for Sampling Tweets (VBEST) algorithm for selecting a probability based sample of tweets. We compare the performance of VBEST sample estimates to other methods of accessing Twitter through the Search API on the distribution of total Tweets as well as COVID-19 keyword incidence and frequency and find that the VBEST samples produce consistent and relatively low levels of overall bias compared to common methods of access through the Search API across many experimental conditions.
引用
收藏
页数:32
相关论文
共 50 条
  • [41] A probability-based approach for multi-scale image feature extraction
    Thanh Le
    Schuff, Norbert
    2014 11TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS (ITNG), 2014, : 143 - 148
  • [42] Reliability-Aware VNF Placement Using a Probability-Based Approach
    Wu, Yunyi
    Zheng, Weichang
    Zhang, Yongbing
    Li, Jie
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2021, 18 (03): : 2478 - 2491
  • [43] Twitter-User Recommender System using Tweets: A Content-based Approach
    Nidhi, R. H.
    Annappa, B.
    2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS), 2017,
  • [44] A New Hybrid Probability-Based Method for Identifying Proteins and Protein Modifications
    Wang, Penghao
    Wilson, Susan R.
    PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB), 2013, : 1 - 8
  • [45] Probability-based assessment of highway bridges according to the new Danish guideline
    O'Connor, A.
    Enevoldsen, I.
    STRUCTURE AND INFRASTRUCTURE ENGINEERING, 2009, 5 (02) : 157 - 168
  • [46] Use of probability-based sampling of water-quality indicators in supporting development of quality criteria
    Nelson, Walter G.
    Brown, Cheryl A.
    ICES JOURNAL OF MARINE SCIENCE, 2008, 65 (08) : 1421 - 1427
  • [47] Probability-based approach for parametrisation of traditional underfrequency load-shedding schemes
    Bogovic, Jerneja
    Rudez, Urban
    Mihalic, Rafael
    IET GENERATION TRANSMISSION & DISTRIBUTION, 2015, 9 (16) : 2625 - 2632
  • [48] The Impact of Access Points Placement on Indoor Positioning Systems: A Probability-Based Approach
    Youssef, Ahmed A. F.
    Abi-Char, Pierre E.
    2019 42ND INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2019, : 459 - 465
  • [49] A clustering- and probability-based approach for time-multiplexed FPGA partitioning
    Wu, GM
    Chao, MCT
    Chang, YW
    INTEGRATION-THE VLSI JOURNAL, 2004, 38 (02) : 245 - 265
  • [50] A probability-based approach to match species with reserves when data are at different resolutions
    Alagador, Diogo
    Martins, Maria Joao
    Cerdeira, Jorge Orestes
    Cabeza, Mar
    Bastos Araujo, Miguel
    BIOLOGICAL CONSERVATION, 2011, 144 (02) : 811 - 820