High-Performance Spatial Query Processing on Big Taxi Trip Data using GPGPUs

被引:10
|
作者
Zhang, Jianting [1 ]
You, Simin [2 ]
Gruenwald, Le [3 ]
机构
[1] CUNY, Dept Comp Sci, New York, NY 10021 USA
[2] CUNY, Grad Ctr, Dept Comp Sci, New York, NY USA
[3] Univ Oklahoma, Sch Comp Sci, Norman, OK 73019 USA
关键词
High Performance; Spatial Query; Big Data; Taxi Trip; GPGPU;
D O I
10.1109/BigData.Congress.2014.20
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
City-wide GPS recorded taxi trip data contains rich information for traffic and travel analysis to facilitate transportation planning and urban studies. However, traditional data management techniques are largely incapable of processing big taxi trip data at the scale of hundreds of millions. In this study, we aim at utilizing the General Purpose computing on Graphics Processing Units (GPGPUs) technologies to speed up processing complex spatial queries on big taxi data on inexpensive commodity GPUs. By using the land use types of tax lot polygons as a proxy for trip purposes at the pickup and drop-off locations, we formulate a taxi trip data analysis problem as a large-scale nearest neighbor spatial query problem based on point-to-polygon distance. Experiments on nearly 170 million taxi trips in the New York City (NYC) in 2009 and 735,488 tax lot polygons with 4,698,986 vertices have demonstrated the efficiency of the proposed techniques: the GPU implementations is about 10-20X faster than the host system and completes the spatial query in about a minute by using a low-end workstation equipped with an Nvidia GTX Titan GPU device with a total equipment cost of below $2,000. We further discuss several interesting patterns discovered from the query results which warrant further study. The proposed approach can be an interesting alternative to traditional MapReduce/Hadoop based approaches to processing big data with respect to performance and cost.
引用
收藏
页码:72 / 79
页数:8
相关论文
共 50 条
  • [41] CedCom: A High-Performance Architecture for Big Data Applications
    Raynaud, Tanguy
    Haque, Rafiqul
    Ait-kaci, Hassan
    2014 IEEE/ACS 11TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2014, : 621 - 632
  • [42] Perspectives on High-Performance Computing in a Big Data World
    Fox, Geoffrey C.
    HPDC'19: PROCEEDINGS OF THE 28TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2019, : 145 - 145
  • [43] Efficient query processing platform for uncertain big data
    Huang, Zhenhua
    Zhang, Jiawen
    Fang, Qiang
    International Journal of Database Theory and Application, 2015, 8 (05): : 149 - 160
  • [44] A learned cost model for big data query processing
    Li, Yan
    Wang, Liwei
    Wang, Sheng
    Sun, Yuan
    Zheng, Bolong
    Peng, Zhiyong
    INFORMATION SCIENCES, 2024, 670
  • [45] High-performance modelling and simulation for big data applications
    Kolodziej, Joanna
    Gonzalez-Velez, Horacio
    Karatza, Helen D.
    SIMULATION MODELLING PRACTICE AND THEORY, 2017, 76 : 1 - 2
  • [46] YEfficient Spatial Big Data Storage and Query in HBase
    Wang, Ping
    Xu, Fanhua
    Ma, Meng
    Duan, Lihua
    4TH IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2019) / 3RD INTERNATIONAL SYMPOSIUM ON REINFORCEMENT LEARNING (ISRL 2019), 2019, : 149 - 155
  • [47] Distributed Join Query Processing for Big RDF Data
    Elzein, Nahla Mohammed
    Majid, Mazlina Abdul
    Fakherldin, Mohammed
    Hashem, Ibrahim Abaker Targio
    ADVANCED SCIENCE LETTERS, 2018, 24 (10) : 7758 - 7761
  • [48] Approximate Query Processing for Big Data in Heterogeneous Databases
    Muniswamaiah, Manoj
    Agerwala, Tilak
    Tappert, Charles C.
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5765 - 5767
  • [49] Spatial Data Indexing and Query Processing in GeoCloud
    Shankar, Karthi
    Sevugan, Prabu
    JOURNAL OF TESTING AND EVALUATION, 2019, 47 (06) : 4039 - 4053
  • [50] Query Performance Analysis of NoSQL and Big Data
    Samanta, Ashis Kumar
    Sarkar, Bidut Biman
    Chaki, Nabendu
    2018 FOURTH IEEE INTERNATIONAL CONFERENCE ON RESEARCH IN COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (ICRCICN), 2018, : 237 - 241