Correcting Bias in Crowdsourced Data to Map Bicycle Ridership of All Bicyclists

被引:52
|
作者
Roy, Avipsa [1 ]
Nelson, Trisalyn A. [1 ]
Fotheringham, A. Stewart [1 ]
Winters, Meghan [2 ]
机构
[1] Arizona State Univ, Sch Geog Sci & Urban Planning, 975 S Myrtle Ave,COOR Hall,5th Floor, Tempe, AZ 85281 USA
[2] Simon Fraser Univ, Fac Hlth Sci, Blusson Hall,8888 Univ Dr, Burnaby, BC V5A 1S6, Canada
关键词
bias correction; LASSO; active transportation; big data; crowdsourcing; BUILT ENVIRONMENT; GEOGRAPHIC INFORMATION; PHYSICAL-ACTIVITY; TRANSPORTATION; WALKING; CHOICES; MODELS;
D O I
10.3390/urbansci3020062
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Traditional methods of counting bicyclists are resource-intensive and generate data with sparse spatial and temporal detail. Previous research suggests big data from crowdsourced fitness apps offer a new source of bicycling data with high spatial and temporal resolution. However, crowdsourced bicycling data are biased as they oversample recreational riders. Our goals are to quantify geographical variables, which can help in correcting bias in crowdsourced, data and to develop a generalized method to correct bias in big crowdsourced data on bicycle ridership in different settings in order to generate maps for cities representative of all bicyclists at a street-level spatial resolution. We used street-level ridership data for 2016 from a crowdsourced fitness app (Strava), geographical covariate data, and official counts from 44 locations across Maricopa County, Arizona, USA (training data); and 60 locations from the city of Tempe, within Maricopa (test data). First, we quantified the relationship between Strava and official ridership data volumes. Second, we used a multi-step approach with variable selection using LASSO followed by Poisson regression to integrate geographical covariates, Strava, and training data to correct bias. Finally, we predicted bias-corrected average annual daily bicyclist counts for Tempe and evaluated the model's accuracy using the test data. We found a correlation between the annual ridership data from Strava and official counts (R-2 = 0.76) in Maricopa County for 2016. The significant variables for correcting bias were: The proportion of white population, median household income, traffic speed, distance to residential areas, and distance to green spaces. The model could correct bias in crowdsourced data from Strava in Tempe with 86% of road segments being predicted within a margin of +/- 100 average annual bicyclists. Our results indicate that it is possible to map ridership for cities at the street-level by correcting bias in crowdsourced bicycle ridership data, with access to adequate data from official count programs and geographical covariates at a comparable spatial and temporal resolution.
引用
收藏
页数:20
相关论文
共 50 条
  • [41] Joint MAP bias estimation and data association: Algorithms
    Danford, Scott
    Kragel, Bret
    Poore, Aubrey
    SIGNAL AND DATA PROCESSING OF SMALL TARGETS 2007, 2007, 6699
  • [42] Joint MAP bias estimation and data association: Simulations
    Danford, Scott
    Kragel, Bret
    Poore, Aubrey
    SIGNAL AND DATA PROCESSING OF SMALL TARGETS 2007, 2007, 6699
  • [43] Impacts of topography and weather barriers on commercial cargo bicycle energy using urban delivery crowdsourced cycling data
    Giordano, Alessandro
    Matthews, H. Scott
    Baptista, Patricia
    Fischbeck, Paul
    SUSTAINABLE CITIES AND SOCIETY, 2022, 76
  • [44] Correcting the Bias of the Root Mean Squared Error of Approximation Under Missing Data
    Fitzgerald, Cailey E.
    Estabrook, Ryne
    Martin, Daniel P.
    Brandmaier, Andreas M.
    von Oertzen, Timo
    METHODOLOGY-EUROPEAN JOURNAL OF RESEARCH METHODS FOR THE BEHAVIORAL AND SOCIAL SCIENCES, 2021, 17 (03) : 189 - 204
  • [45] Correcting Misclassification Bias in Regression Models with Variables Generated via Data Mining
    Qiao, Mengke
    Huang, Ke-Wei
    INFORMATION SYSTEMS RESEARCH, 2021, 32 (02) : 462 - 480
  • [46] Correcting for bias in distribution modelling for rare species using citizen science data
    Robinson, Orin J.
    Ruiz-Gutierrez, Viviana
    Fink, Daniel
    DIVERSITY AND DISTRIBUTIONS, 2018, 24 (04) : 460 - 472
  • [47] Correcting length-bias in gene set analysis for DNA methylation data
    Li, Shaoyu
    He, Tao
    Pawlikowska, Iwona
    Lin, Tong
    STATISTICS AND ITS INTERFACE, 2017, 10 (02) : 279 - 289
  • [48] An alternative Bayesian data envelopment analysis approach for correcting bias or efficiency estimators
    Zervopoulos, Panagiotis D.
    Triantis, Konstantinos
    Sklavos, Sokratis
    Kanas, Angelos
    JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2023, 74 (04) : 1021 - 1041
  • [49] CORRECTING 2-ALTERNATIVE FORCED-CHOICE DATA FOR RESPONSE BIAS
    AUERBACH, C
    PERCEPTUAL AND MOTOR SKILLS, 1971, 32 (02) : 533 - &
  • [50] Missing Data Inference for Crowdsourced Radio Map Construction: An Adversarial Auto-Encoder Method
    Zhang, Aijin
    Zhu, Kun
    Wang, Ran
    Yi, Changyan
    2021 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2021,