PDBSCAN: Parallel DBSCAN for Large-Scale Clustering Applications

被引:4
|
作者
谢永红 [1 ,2 ]
马延辉 [1 ,2 ]
周芳 [1 ,2 ]
刘颖安 [1 ,2 ]
机构
[1] School of Computer and Communication Engineering, University of Science and Technology Beijing
[2] Key Laboratory for Advanced Control of Iron and Steel Process, Ministry of Education, University of Science and Technology Beijing
关键词
parallel DBSCAN; clustering; MapReduce; large-scale applications;
D O I
10.19884/j.1672-5220.2012.01.018
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
Density-based algorithm for discovering clusters in large spatial databases with noise(DBSCAN) is a classic kind of density-based spatial clustering algorithm and is widely applied in several aspects due to good performance in capturing arbitrary shapes and detecting outliers. However, in practice, datasets are always too massive to fit the serial DBSCAN. And a new parallel algorithm-Parallel DBSCAN(PDBSCAN) was proposed to solve the problem which DBSCAN faced. The proposed parallel algorithm bases on MapReduce mechanism. The usage of parallel mechanism in the algorithm focuses on region query and candidate queue processing which needed substantive computation resources. As a result, PDBSCAN is scalable for large-scale dataset clustering and is extremely suitable for applications in E-Commence, especially for recommendation.
引用
收藏
页码:76 / 79
页数:4
相关论文
共 50 条
  • [1] Large-scale parallel data clustering
    Judd, D
    McKinley, PK
    Jain, AK
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (08) : 871 - 876
  • [2] KNN-BLOCK DBSCAN: Fast Clustering for Large-Scale Data
    Chen, Yewang
    Zhou, Lida
    Pei, Songwen
    Yu, Zhiwen
    Chen, Yi
    Liu, Xin
    Du, Jixiang
    Xiong, Naixue
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2021, 51 (06): : 3939 - 3953
  • [3] Parallel simulation of large-scale parallel applications
    Bagrodia, R
    Deelman, E
    Phan, T
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2001, 15 (01): : 3 - 12
  • [4] Efficient Group Communication for Large-Scale Parallel Clustering
    Pettinger, David
    Di Fatta, Giuseppe
    [J]. INTELLIGENT DISTRIBUTED COMPUTING VI, 2013, 446 : 155 - 164
  • [5] An Efficient Class-Constrained DBSCAN Approach for Large-Scale Point Cloud Clustering
    Zhang, Hua
    Duan, Zhenwei
    Zheng, Nanshan
    Li, Yong
    Zeng, Yu
    Shi, Wenzhong
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 7323 - 7332
  • [6] An Efficient Class-Constrained DBSCAN Approach for Large-Scale Point Cloud Clustering
    Zhang, Hua
    Duan, Zhenwei
    Zheng, Nanshan
    Li, Yong
    Zeng, Yu
    Shi, Wenzhong
    [J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15 : 7323 - 7332
  • [7] A preference clustering protocol for large-scale multicast applications
    Wong, T
    Katz, R
    McCanne, S
    [J]. NETWORKED GROUP COMMUNICATION, PROCEEDINGS, 1999, 1736 : 1 - 18
  • [8] Large-Scale Subspace Clustering by Independent Distributed and Parallel Coding
    Li, Jun
    Tao, Zhiqiang
    Wu, Yue
    Zhong, Bineng
    Fu, Yun
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) : 9090 - 9100
  • [9] Parallel Clustering Algorithm for Large-Scale Biological Data Sets
    Wang, Minchao
    Zhang, Wu
    Ding, Wang
    Dai, Dongbo
    Zhang, Huiran
    Xie, Hao
    Chen, Luonan
    Guo, Yike
    Xie, Jiang
    [J]. PLOS ONE, 2014, 9 (04):
  • [10] Continuous Performance Monitoring for Large-Scale Parallel Applications
    Dooley, Isaac
    Lee, Chee Wai
    Kale, Laxmikant V.
    [J]. 16TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), PROCEEDINGS, 2009, : 445 - 452