DPM: Fast and scalable Clustering Algorithm for Large Scale High Dimensional Datasets

被引：0

作者：

Ghanem, Tamer F. ^{[1
,4
,5
]}

Elkilani, Wail S. ^{[2
,4
,6
]}

Ahmed, Hatem S. ^{[3
,4
,5
]}

Hadhoud, Mohiy M. ^{[1
,4
,5
]}

机构：

[1] Menofiya Univ, Dept Informat Technol, Shibin Al Kawm, Menofiya, Egypt

[2] Menofiya Univ, Comp Syst Dept, Shibin Al Kawm, Menofiya, Egypt

[3] Menofiya Univ, Dept Informat Syst, Shibin Al Kawm, Menofiya, Egypt

[4] Menofiya Univ, Fac Comp & Informat, Shibin Al Kawm, Menofiya, Egypt

[5] Menofiya Univ, Shibin Al Kawm, Menofiya, Egypt

[6] Ain Shams Univ, Cairo, Egypt

来源：

2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES) | 2014年

关键词：

Clustering; subspace clustering; density-based clustering;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Clustering multi-dense large scale high dimensional datasets is a challenging task duo to high time complexity of most clustering algorithms. Nowadays, data collection tools produce a large amount of data. So, fast algorithms are vital requirement for clustering such data. In this paper, a fast clustering algorithm, called Dimension-based Partitioning and Merging (DPM), is proposed. In DPM, First, data is partitioned into small dense volumes during the successive processing of dataset dimensions. Then, noise is filtered out using dimensional densities of the generated partitions. Finally, merging process is invoked to construct clusters based on partition boundary data samples. DPM algorithm automatically detects the number of data clusters based on three insensitive tuning parameters which decrease the burden of its usage. Performance evaluation of the proposed algorithm using different datasets shows its fastness and accuracy compared to other clustering competitors.

引用

页码：71 / 79

页数：9

共 50 条

[1] DPM: Fast and scalable Clustering Algorithm for Large Scale High Dimensional Datasets
Ghanem, Tamer F.
Elkilani, Wail S.
Ahmed, Hatem S.
Hadhoud, Mohiy M.
[J]. 2014 10TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO), 2014, : 26 - 35
[2] A fast fuzzy clustering algorithm for large-scale datasets
Shi, LK
He, PL
[J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 203 - 208
[3] AGRID: An efficient algorithm for clustering large high-dimensional datasets
Zhao, YC
Song, JD
[J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2003, 2637 : 271 - 282
[4] Fast adaptive clustering by synchronization on large scale datasets
Ying, Wenhao
Xu, Min
Wang, Shitong
Deng, Zhaohong
[J]. Ying, W. (cslgywh@163.com), 1600, Science Press (51): : 707 - 720
[5] Parallel algorithms for clustering high-dimensional large-scale datasets
Nagesh, H
Goil, S
Choudhary, A
[J]. DATA MINING FOR SCIENTIFIC AND ENGINEERING APPLICATIONS, 2001, 2 : 335 - 356
[6] A novel algorithm for fast and scalable subspace clustering of high-dimensional data
Kaur A.
Datta A.
[J]. Journal of Big Data, 2015, 2 (01)
[7] LARGE-SCALE HIGH-DIMENSIONAL CLUSTERING WITH FAST SKETCHING
Chatalic, Antoine
Gribonval, Remi
Keriven, Nicolas
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4714 - 4718
[8] An Efficient Density Biased Sampling Algorithm for Clustering Large High-Dimensional Datasets
Qian, Xue-Zhong
Deng, Jie
[J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (08)
[9] Efficient Hierarchical Clustering of Large High Dimensional Datasets
Gilpin, Sean
Qian, Buyue
Davidson, Ian
[J]. PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1371 - 1380
[10] A fast classification strategy for SVM on the large-scale high-dimensional datasets
I-Jing Li
Jiunn-Lin Wu
Chih-Hung Yeh
[J]. Pattern Analysis and Applications, 2018, 21 : 1023 - 1038

← 1 2 3 4 5 →