Correcting Bias in Crowdsourced Data to Map Bicycle Ridership of All Bicyclists

被引：52

作者：

Roy, Avipsa ^{[1
]}

Nelson, Trisalyn A. ^{[1
]}

Fotheringham, A. Stewart ^{[1
]}

Winters, Meghan ^{[2
]}

机构：

[1] Arizona State Univ, Sch Geog Sci & Urban Planning, 975 S Myrtle Ave,COOR Hall,5th Floor, Tempe, AZ 85281 USA

[2] Simon Fraser Univ, Fac Hlth Sci, Blusson Hall,8888 Univ Dr, Burnaby, BC V5A 1S6, Canada

来源：

URBAN SCIENCE | 2019年 / 3卷 / 02期

关键词：

bias correction; LASSO; active transportation; big data; crowdsourcing; BUILT ENVIRONMENT; GEOGRAPHIC INFORMATION; PHYSICAL-ACTIVITY; TRANSPORTATION; WALKING; CHOICES; MODELS;

D O I：

10.3390/urbansci3020062

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

Traditional methods of counting bicyclists are resource-intensive and generate data with sparse spatial and temporal detail. Previous research suggests big data from crowdsourced fitness apps offer a new source of bicycling data with high spatial and temporal resolution. However, crowdsourced bicycling data are biased as they oversample recreational riders. Our goals are to quantify geographical variables, which can help in correcting bias in crowdsourced, data and to develop a generalized method to correct bias in big crowdsourced data on bicycle ridership in different settings in order to generate maps for cities representative of all bicyclists at a street-level spatial resolution. We used street-level ridership data for 2016 from a crowdsourced fitness app (Strava), geographical covariate data, and official counts from 44 locations across Maricopa County, Arizona, USA (training data); and 60 locations from the city of Tempe, within Maricopa (test data). First, we quantified the relationship between Strava and official ridership data volumes. Second, we used a multi-step approach with variable selection using LASSO followed by Poisson regression to integrate geographical covariates, Strava, and training data to correct bias. Finally, we predicted bias-corrected average annual daily bicyclist counts for Tempe and evaluated the model's accuracy using the test data. We found a correlation between the annual ridership data from Strava and official counts (R-2 = 0.76) in Maricopa County for 2016. The significant variables for correcting bias were: The proportion of white population, median household income, traffic speed, distance to residential areas, and distance to green spaces. The model could correct bias in crowdsourced data from Strava in Tempe with 86% of road segments being predicted within a margin of +/- 100 average annual bicyclists. Our results indicate that it is possible to map ridership for cities at the street-level by correcting bias in crowdsourced bicycle ridership data, with access to adequate data from official count programs and geographical covariates at a comparable spatial and temporal resolution.

引用

页数：20

共 50 条

[31] Putting User Reputation on the Map: Unsupervised Quality Control for Crowdsourced Historical Data
Barz, Bjoern
van Dijk, Thomas C.
Spaan, Bert
Denzler, Joachim
PROCEEDINGS OF THE 2ND ACM SIGSPATIAL WORKSHOP ON GEOSPATIAL HUMANITIES, GEOHUMANITIES 2018, 2018,
[32] Total costs of bicycle injuries in Norway:: Correcting injury figures and indicating data needs
Veisten, Knut
Saelensminde, Kjartan
Alvaer, Kari
Bjornskau, Torkel
Elvik, Rune
Schistad, Trude
Ytterstad, Borge
ACCIDENT ANALYSIS AND PREVENTION, 2007, 39 (06): : 1162 - 1169
[33] Floor Classification on Crowdsourced Data for Wi-Fi Radio Map Construction
Sung, Changmin
Han, Dongsoo
2022 IEEE 12TH INTERNATIONAL CONFERENCE ON INDOOR POSITIONING AND INDOOR NAVIGATION (IPIN 2022), 2022,
[34] Pine: A System For Crowdsourced Spatial Data Source Discovery While Map Browsing
Haynes, Myles
Hendawi, Abdeltawab
Ali, Mohamed
26TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2018), 2018, : 592 - 595
[35] Functional data analysis approach for mapping change in time series: A case study using bicycle ridership patterns
Roy, Avipsa
Nelson, Trisalyn
Turaga, Pavan
TRANSPORTATION RESEARCH INTERDISCIPLINARY PERSPECTIVES, 2023, 17
[36] Leveraging the spatial-temporal resolution of crowdsourced cycling data to improve the estimation of hourly bicycle volume
Kwigizile, Valerian
Kwayu, Keneth Morgan
Oh, Jun-Seok
TRANSPORTATION RESEARCH INTERDISCIPLINARY PERSPECTIVES, 2022, 14
[37] It's All Relative! A Method to Counter Human Bias in Crowdsourced Stance Detection of News Articles
Haq E.-U.
Lu Y.K.
Hui P.
Proceedings of the ACM on Human-Computer Interaction, 2022, 6 (CSCW2)
[38] Correcting bias of satellite rainfall data using physical empirical model
Ziarh, Ghaith Falah
Shahid, Shamsuddin
Bin Ismail, Tarmizi
Asaduzzaman, Md
Dewan, Ashraf
ATMOSPHERIC RESEARCH, 2021, 251
[39] Correcting Bias in Extreme Groups Design Using a Missing Data Approach
Chen, Lihan
Fouladi, Rachel T.
PSYCHOLOGICAL METHODS, 2022,
[40] A SIMPLE PROCEDURE FOR CORRECTING SHADOWBAND DATA FOR ALL SKY CONDITIONS
LEBARON, BA
MICHALSKY, JJ
PEREZ, R
SOLAR ENERGY, 1990, 44 (05) : 249 - 256

← 1 2 3 4 5 →