An SNN-DBSCAN Based Clustering Algorithm for Big Data

被引:2
|
作者
Pandey, Sriniwas [1 ]
Samal, Mamata
Mohanty, Sraban Kumar [1 ]
机构
[1] PDPM Indian Inst Informat Technol Design & Mfg, Comp Sci & Engn, Jabalpur 482005, Madhya Pradesh, India
关键词
Clustering; SNN clustering; External memory algorithms; Big data clustering;
D O I
10.1007/978-981-15-1081-6_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is a technique to partition data into different groups in such a way that data items in a group are more similar to each other than the data points in any other group. The assumption of infinite main memory is very usual while designing most of the clustering algorithms but this assumption fails when the size of data set starts increasing. In this scenario, data needs to be stored in the secondary memory and time spent in the input/outputs (I/O) dominates the actual computational time. Therefore by reducing the I/O, the efficiency of the clustering techniques can be improved. In this paper, one shared near neighbor based algorithm is devised by minimizing its I/O complexity to make it suitable for the Big Data in external memory model proposed by Aggarwal and Vitter. There is no change in the computational steps, hence cluster quality remains the same. We implement the algorithm in the STXXL library to show its efficacy for Big Data sets.
引用
收藏
页码:127 / 137
页数:11
相关论文
共 50 条
  • [1] BISDBx: towards batch-incremental clustering for dynamic datasets using SNN-DBSCAN
    Bhattacharjee, Panthadeep
    Mitra, Pinaki
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2020, 23 (02) : 975 - 1009
  • [2] BISDBx: towards batch-incremental clustering for dynamic datasets using SNN-DBSCAN
    Panthadeep Bhattacharjee
    Pinaki Mitra
    [J]. Pattern Analysis and Applications, 2020, 23 : 975 - 1009
  • [3] DBSCAN Clustering Algorithm Based on Big Data Is Applied in Network Information Security Detection
    Zhang, Yan
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [4] K-DBSCAN: An improved DBSCAN algorithm for big data
    Nahid Gholizadeh
    Hamid Saadatfar
    Nooshin Hanafi
    [J]. The Journal of Supercomputing, 2021, 77 : 6214 - 6235
  • [5] K-DBSCAN: An improved DBSCAN algorithm for big data
    Gholizadeh, Nahid
    Saadatfar, Hamid
    Hanafi, Nooshin
    [J]. JOURNAL OF SUPERCOMPUTING, 2021, 77 (06): : 6214 - 6235
  • [6] Laser Radar Data Registration Algorithm Based on DBSCAN Clustering
    Liu, Yiting
    Zhang, Lei
    Li, Peijuan
    Jia, Tong
    Du, Junfeng
    Liu, Yawen
    Li, Rui
    Yang, Shutao
    Tong, Jinwu
    Yu, Hanqi
    [J]. ELECTRONICS, 2023, 12 (06)
  • [7] RETRACTED: DBSCAN Clustering Algorithm Based on Big Data Is Applied in Network Information Security Detection (Retracted Article)
    Zhang, Yan
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [8] A Clustering Algorithm Based on FDP and DBSCAN
    Wang, Zhihe
    Huang, Mengying
    Du, Hui
    Qin, Hongwu
    [J]. 2018 14TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2018, : 145 - 149
  • [9] A fast DBSCAN algorithm for big data based on efficient density calculation
    Hanafi, Nooshin
    Saadatfar, Hamid
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 203
  • [10] μDBSCAN: An Exact Scalable DBSCAN Algorithm for Big Data Exploiting Spatial Locality
    Sarma, Aditya
    Goyal, Poonam
    Kumari, Sonal
    Wani, Anand
    Challa, Jagat Sesh
    Islam, Saiyedul
    Goyal, Navneet
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2019, : 171 - 181