DPM: Fast and scalable Clustering Algorithm for Large Scale High Dimensional Datasets

被引:0
|
作者
Ghanem, Tamer F. [1 ,4 ,5 ]
Elkilani, Wail S. [2 ,4 ,6 ]
Ahmed, Hatem S. [3 ,4 ,5 ]
Hadhoud, Mohiy M. [1 ,4 ,5 ]
机构
[1] Menofiya Univ, Dept Informat Technol, Shibin Al Kawm, Menofiya, Egypt
[2] Menofiya Univ, Comp Syst Dept, Shibin Al Kawm, Menofiya, Egypt
[3] Menofiya Univ, Dept Informat Syst, Shibin Al Kawm, Menofiya, Egypt
[4] Menofiya Univ, Fac Comp & Informat, Shibin Al Kawm, Menofiya, Egypt
[5] Menofiya Univ, Shibin Al Kawm, Menofiya, Egypt
[6] Ain Shams Univ, Cairo, Egypt
关键词
Clustering; subspace clustering; density-based clustering;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Clustering multi-dense large scale high dimensional datasets is a challenging task duo to high time complexity of most clustering algorithms. Nowadays, data collection tools produce a large amount of data. So, fast algorithms are vital requirement for clustering such data. In this paper, a fast clustering algorithm, called Dimension-based Partitioning and Merging (DPM), is proposed. In DPM, First, data is partitioned into small dense volumes during the successive processing of dataset dimensions. Then, noise is filtered out using dimensional densities of the generated partitions. Finally, merging process is invoked to construct clusters based on partition boundary data samples. DPM algorithm automatically detects the number of data clusters based on three insensitive tuning parameters which decrease the burden of its usage. Performance evaluation of the proposed algorithm using different datasets shows its fastness and accuracy compared to other clustering competitors.
引用
收藏
页码:71 / 79
页数:9
相关论文
共 50 条
  • [1] DPM: Fast and scalable Clustering Algorithm for Large Scale High Dimensional Datasets
    Ghanem, Tamer F.
    Elkilani, Wail S.
    Ahmed, Hatem S.
    Hadhoud, Mohiy M.
    [J]. 2014 10TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO), 2014, : 26 - 35
  • [2] A fast fuzzy clustering algorithm for large-scale datasets
    Shi, LK
    He, PL
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 203 - 208
  • [3] AGRID: An efficient algorithm for clustering large high-dimensional datasets
    Zhao, YC
    Song, JD
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2003, 2637 : 271 - 282
  • [4] Fast adaptive clustering by synchronization on large scale datasets
    Ying, Wenhao
    Xu, Min
    Wang, Shitong
    Deng, Zhaohong
    [J]. Ying, W. (cslgywh@163.com), 1600, Science Press (51): : 707 - 720
  • [5] Parallel algorithms for clustering high-dimensional large-scale datasets
    Nagesh, H
    Goil, S
    Choudhary, A
    [J]. DATA MINING FOR SCIENTIFIC AND ENGINEERING APPLICATIONS, 2001, 2 : 335 - 356
  • [6] A novel algorithm for fast and scalable subspace clustering of high-dimensional data
    Kaur A.
    Datta A.
    [J]. Journal of Big Data, 2015, 2 (01)
  • [7] LARGE-SCALE HIGH-DIMENSIONAL CLUSTERING WITH FAST SKETCHING
    Chatalic, Antoine
    Gribonval, Remi
    Keriven, Nicolas
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4714 - 4718
  • [8] An Efficient Density Biased Sampling Algorithm for Clustering Large High-Dimensional Datasets
    Qian, Xue-Zhong
    Deng, Jie
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (08)
  • [9] Efficient Hierarchical Clustering of Large High Dimensional Datasets
    Gilpin, Sean
    Qian, Buyue
    Davidson, Ian
    [J]. PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1371 - 1380
  • [10] A fast classification strategy for SVM on the large-scale high-dimensional datasets
    I-Jing Li
    Jiunn-Lin Wu
    Chih-Hung Yeh
    [J]. Pattern Analysis and Applications, 2018, 21 : 1023 - 1038