DPM: Fast and scalable Clustering Algorithm for Large Scale High Dimensional Datasets

被引:0
|
作者
Ghanem, Tamer F. [1 ,4 ,5 ]
Elkilani, Wail S. [2 ,4 ,6 ]
Ahmed, Hatem S. [3 ,4 ,5 ]
Hadhoud, Mohiy M. [1 ,4 ,5 ]
机构
[1] Menofiya Univ, Dept Informat Technol, Shibin Al Kawm, Menofiya, Egypt
[2] Menofiya Univ, Comp Syst Dept, Shibin Al Kawm, Menofiya, Egypt
[3] Menofiya Univ, Dept Informat Syst, Shibin Al Kawm, Menofiya, Egypt
[4] Menofiya Univ, Fac Comp & Informat, Shibin Al Kawm, Menofiya, Egypt
[5] Menofiya Univ, Shibin Al Kawm, Menofiya, Egypt
[6] Ain Shams Univ, Cairo, Egypt
关键词
Clustering; subspace clustering; density-based clustering;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Clustering multi-dense large scale high dimensional datasets is a challenging task duo to high time complexity of most clustering algorithms. Nowadays, data collection tools produce a large amount of data. So, fast algorithms are vital requirement for clustering such data. In this paper, a fast clustering algorithm, called Dimension-based Partitioning and Merging (DPM), is proposed. In DPM, First, data is partitioned into small dense volumes during the successive processing of dataset dimensions. Then, noise is filtered out using dimensional densities of the generated partitions. Finally, merging process is invoked to construct clusters based on partition boundary data samples. DPM algorithm automatically detects the number of data clusters based on three insensitive tuning parameters which decrease the burden of its usage. Performance evaluation of the proposed algorithm using different datasets shows its fastness and accuracy compared to other clustering competitors.
引用
收藏
页码:71 / 79
页数:9
相关论文
共 50 条
  • [31] The k-prototype algorithm of clustering high dimensional and large scale mixed data
    Liu, Hui
    Dai, Bo
    He, Hui
    Yan, Yang
    [J]. WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING, VOL 1 AND 2, 2006, : 738 - +
  • [32] A scalable association rule learning and recommendation algorithm for large-scale microarray datasets
    Haosong Li
    Phillip C.-Y. Sheu
    [J]. Journal of Big Data, 9
  • [33] A scalable association rule learning and recommendation algorithm for large-scale microarray datasets
    Li, Haosong
    Sheu, Phillip C-Y
    [J]. JOURNAL OF BIG DATA, 2022, 9 (01)
  • [34] Scalable Clustering for Mining Local-Correlated Clusters in High Dimensions and Large Datasets
    Lu, Kun-Che
    Yang, Don-Lin
    [J]. FUNDAMENTA INFORMATICAE, 2010, 98 (01) : 15 - 32
  • [35] SWIFT-Scalable Clustering for Automated Identification of Rare Cell Populations in Large, High-Dimensional Flow Cytometry Datasets, Part 1: Algorithm Design
    Naim, Iftekhar
    Datta, Suprakash
    Rebhahn, Jonathan
    Cavenaugh, James S.
    Mosmann, Tim R.
    Sharma, Gaurav
    [J]. CYTOMETRY PART A, 2014, 85 (05) : 408 - 421
  • [36] MPRK Algorithm for Clustering the Large Text Datasets
    Thangarasu, M.
    Inbarani, H. Hannah
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER APPLICATIONS (ICACA), 2016, : 224 - 229
  • [37] Scalable algorithms for clustering large datasets with mixed type attributes
    He, ZY
    Xu, XF
    Deng, SC
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2005, 20 (10) : 1077 - 1089
  • [38] Semisupervised clustering algorithm combining SUBCLU and constrained clustering for detecting groups in high dimensional datasets
    Alexander Calvo-Valverde, Luis
    Vallejos-Pena, Alonso
    [J]. TECNOLOGIA EN MARCHA, 2018, 31 (03): : 74 - 85
  • [39] A Fast Clustering Algorithm for Modularization of Large-Scale Software Systems
    Teymourian, Navid
    Izadkhah, Habib
    Isazadeh, Ayaz
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (04) : 1451 - 1462
  • [40] Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm
    Grotkjær, T
    Winther, O
    Regenberg, B
    Nielsen, J
    Hansen, LK
    [J]. BIOINFORMATICS, 2006, 22 (01) : 58 - 67