Research on the Parallelization of the DBSCAN Clustering Algorithm for Spatial Data Mining Based on the Spark Platform

被引:31
|
作者
Huang, Fang [1 ,2 ]
Zhu, Qiang [1 ]
Zhou, Ji [1 ]
Tao, Jian [3 ,4 ]
Zhou, Xiaocheng [5 ]
Jin, Du [1 ]
Tan, Xicheng [6 ]
Wang, Lizhe [7 ,8 ]
机构
[1] Univ Elect Sci & Technol China, Sch Resources & Environm, 2006 Xiyuan Ave, Chengdu 611731, Sichuan, Peoples R China
[2] Univ Elect Sci & Technol China, Inst Remote Sensing Big Data, Big Data Res Ctr, 2006 Xiyuan Rd, Chengdu 611731, Sichuan, Peoples R China
[3] Texas A&M Univ, Texas A&M Engn Expt Stn, College Stn, TX 77843 USA
[4] Texas A&M Univ, High Performance Res Comp, College Stn, TX 77843 USA
[5] Fuzhou Univ, Minist Educ, Key Lab Spatial Data Min & Informat Sharing, 2 Xueyuan Rd, Fuzhou 350116, Fujian, Peoples R China
[6] Wuhan Univ, Int Sch Software, 129 Luoyu Rd, Wuhan 430079, Hubei, Peoples R China
[7] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China
[8] Chinese Acad Sci, Inst Remote Sensing & Digital Earth, Beijing 10094, Peoples R China
基金
美国国家科学基金会;
关键词
spatial data mining; DBSCAN algorithm; parallel computing; spark platform; traffic congestion area discovery; GPU IMPLEMENTATION; MR-DBSCAN; MAPREDUCE;
D O I
10.3390/rs9121301
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Density-based spatial clustering of applications with noise (DBSCAN) is a density-based clustering algorithm that has the characteristics of being able to discover clusters of any shape, effectively distinguishing noise points and naturally supporting spatial databases. DBSCAN has been widely used in the field of spatial data mining. This paper studies the parallelization design and realization of the DBSCAN algorithm based on the Spark platform, and solves the following problems that arise when computing macro data: the requirement of a great deal of calculation using the single-node algorithm; the low level of resource-utilization with the multi-node algorithm; the large time consumption; and the lack of instantaneity. The experimental results indicate that the proposed parallel algorithm design is able to achieve more stable speedup at an increased involved spatial data scale.
引用
收藏
页数:33
相关论文
共 50 条
  • [1] Research on Parallel Design of DBSCAN Clustering Algorithm in Spatial Data Mining
    Zhou, Gong-jian
    [J]. 2018 INTERNATIONAL CONFERENCE ON ELECTRICAL, CONTROL, AUTOMATION AND ROBOTICS (ECAR 2018), 2018, 307 : 343 - 348
  • [2] Research on Retailer Data Clustering Algorithm Based on Spark
    Huang, Qiuman
    Zhou, Feng
    [J]. ADVANCES IN MATERIALS, MACHINERY, ELECTRONICS I, 2017, 1820
  • [3] Research on differential privacy preserving clustering algorithm based on spark platform
    Meng, Qianqian
    Zhou, Lijuan
    [J]. Journal of Computers (Taiwan), 2018, 29 (01) : 47 - 62
  • [4] Parallelization of K-Means Clustering Algorithm for Data Mining
    Jiang, Hao
    Yu, Liyan
    [J]. 4TH ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS (ITA 2017), 2017, 12
  • [5] Research of parallel DBSCAN clustering algorithm based on MapReduce
    [J]. Fu, X. (xffu@gdut.edu.cn), 1600, Science and Engineering Research Support Society (07):
  • [6] Research and Application of DBSCAN Algorithm Based on Hadoop Platform
    Fu, Xiufen
    Wang, Yaguang
    Ge, Yanna
    Chen, Peiwen
    Teng, Shaohua
    [J]. PERVASIVE COMPUTING AND THE NETWORKED WORLD, 2014, 8351 : 73 - 87
  • [7] Comparative Study of Two Parallel Algorithm K-Means and DBSCAN Clustering on Spark Platform
    Bouhout, Safae
    Oubenaalla, Youness
    Nfaoui, El Habib
    [J]. ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 245 - 262
  • [8] STRP-DBSCAN: A Parallel DBSCAN Algorithm Based on Spatial-Temporal Random Partitioning for Clustering Trajectory Data
    An, Xiaoya
    Wang, Ziming
    Wang, Ding
    Liu, Song
    Jin, Cheng
    Xu, Xinpeng
    Cao, Jianjun
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (20):
  • [9] Distributed Clustering Algorithm for Spatial Data Mining
    Bendechache, Malika
    Kechadi, M-Tahar
    [J]. PROCEEDINGS 2015 SECOND IEEE INTERNATIONAL CONFERENCE ON SPATIAL DATA MINING AND GEOGRAPHICAL KNOWLEDGE SERVICES (ICSDM 2015), 2015, : 60 - 65
  • [10] Dual clustering algorithm for spatial data mining
    Zhou, Jiaogen
    Guan, Jihong
    Bian, Fuling
    [J]. Journal of Computational Information Systems, 2006, 2 (04): : 1405 - 1410