High-Performance Spatial Query Processing on Big Taxi Trip Data using GPGPUs

被引:10
|
作者
Zhang, Jianting [1 ]
You, Simin [2 ]
Gruenwald, Le [3 ]
机构
[1] CUNY, Dept Comp Sci, New York, NY 10021 USA
[2] CUNY, Grad Ctr, Dept Comp Sci, New York, NY USA
[3] Univ Oklahoma, Sch Comp Sci, Norman, OK 73019 USA
关键词
High Performance; Spatial Query; Big Data; Taxi Trip; GPGPU;
D O I
10.1109/BigData.Congress.2014.20
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
City-wide GPS recorded taxi trip data contains rich information for traffic and travel analysis to facilitate transportation planning and urban studies. However, traditional data management techniques are largely incapable of processing big taxi trip data at the scale of hundreds of millions. In this study, we aim at utilizing the General Purpose computing on Graphics Processing Units (GPGPUs) technologies to speed up processing complex spatial queries on big taxi data on inexpensive commodity GPUs. By using the land use types of tax lot polygons as a proxy for trip purposes at the pickup and drop-off locations, we formulate a taxi trip data analysis problem as a large-scale nearest neighbor spatial query problem based on point-to-polygon distance. Experiments on nearly 170 million taxi trips in the New York City (NYC) in 2009 and 735,488 tax lot polygons with 4,698,986 vertices have demonstrated the efficiency of the proposed techniques: the GPU implementations is about 10-20X faster than the host system and completes the spatial query in about a minute by using a low-end workstation equipped with an Nvidia GTX Titan GPU device with a total equipment cost of below $2,000. We further discuss several interesting patterns discovered from the query results which warrant further study. The proposed approach can be an interesting alternative to traditional MapReduce/Hadoop based approaches to processing big data with respect to performance and cost.
引用
收藏
页码:72 / 79
页数:8
相关论文
共 50 条
  • [11] Towards an Efficient Top-K Trajectory Similarity Query Processing Algorithm for Big Trajectory Data on GPGPUs
    Leal, Eleazar
    Gruenwald, Le
    Zhang, Jianting
    You, Simin
    2016 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2016, 2016, : 206 - 213
  • [12] High-Performance Geospatial Big Data Processing System Based on MapReduce
    Jo, Junghee
    Lee, Kang-Woo
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2018, 7 (10):
  • [13] Diversification on big data in query processing
    Zhang, Meifan
    Wang, Hongzhi
    Li, Jianzhong
    Gao, Hong
    FRONTIERS OF COMPUTER SCIENCE, 2020, 14 (04)
  • [14] On the Impact of Memory Allocation on High-Performance Query Processing
    Durner, Dominik
    Leis, Viktor
    Neumann, Thomas
    15TH INTERNATIONAL WORKSHOP ON DATA MANAGEMENT ON NEW HARDWARE (DAMON 2019), 2019,
  • [15] Diversification on big data in query processing
    Meifan Zhang
    Hongzhi Wang
    Jianzhong Li
    Hong Gao
    Frontiers of Computer Science, 2020, 14
  • [16] Sketching-based High-Performance Biomedical Big Data Processing Accelerator
    Kulkarni, Amey
    Jafari, Ali
    Sagedy, Chris
    Mohsenin, Tinoosh
    2016 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2016, : 1138 - 1141
  • [17] Integrated High-Performance Platform for Fast Query Response in Big Data with Hive, Impala, and SparkSQL: A Performance Evaluation
    Chang, Bao Rong
    Tsai, Hsiu-Fen
    Lee, Yun-Da
    APPLIED SCIENCES-BASEL, 2018, 8 (09):
  • [18] A High-performance Spatial Query Engine for Large Event Data Sets Implemented for the Fermi LAT Data
    Stephens, Thomas E.
    ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XVIII, 2009, 411 : 197 - 200
  • [19] A distributed system for fining high profit areas over big taxi trip data with MognoDB and Spark
    Putri, Fadhilah Kurnia
    Kwon, Joonho
    2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 533 - 536
  • [20] Federated Query processing for Big Data in Data Science
    Muniswamaiah, Manoj
    Agerwala, Tilak
    Tappert, Charles C.
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 6145 - 6147