DPM: Fast and scalable Clustering Algorithm for Large Scale High Dimensional Datasets

被引：0

作者：

Ghanem, Tamer F. ^{[1
,4
,5
]}

Elkilani, Wail S. ^{[2
,4
,6
]}

Ahmed, Hatem S. ^{[3
,4
,5
]}

Hadhoud, Mohiy M. ^{[1
,4
,5
]}

机构：

[1] Menofiya Univ, Dept Informat Technol, Shibin Al Kawm, Menofiya, Egypt

[2] Menofiya Univ, Comp Syst Dept, Shibin Al Kawm, Menofiya, Egypt

[3] Menofiya Univ, Dept Informat Syst, Shibin Al Kawm, Menofiya, Egypt

[4] Menofiya Univ, Fac Comp & Informat, Shibin Al Kawm, Menofiya, Egypt

[5] Menofiya Univ, Shibin Al Kawm, Menofiya, Egypt

[6] Ain Shams Univ, Cairo, Egypt

来源：

2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES) | 2014年

关键词：

Clustering; subspace clustering; density-based clustering;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Clustering multi-dense large scale high dimensional datasets is a challenging task duo to high time complexity of most clustering algorithms. Nowadays, data collection tools produce a large amount of data. So, fast algorithms are vital requirement for clustering such data. In this paper, a fast clustering algorithm, called Dimension-based Partitioning and Merging (DPM), is proposed. In DPM, First, data is partitioned into small dense volumes during the successive processing of dataset dimensions. Then, noise is filtered out using dimensional densities of the generated partitions. Finally, merging process is invoked to construct clusters based on partition boundary data samples. DPM algorithm automatically detects the number of data clusters based on three insensitive tuning parameters which decrease the burden of its usage. Performance evaluation of the proposed algorithm using different datasets shows its fastness and accuracy compared to other clustering competitors.

引用

页码：71 / 79

页数：9

共 50 条

[31] The k-prototype algorithm of clustering high dimensional and large scale mixed data
Liu, Hui
Dai, Bo
He, Hui
Yan, Yang
[J]. WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING, VOL 1 AND 2, 2006, : 738 - +
[32] A scalable association rule learning and recommendation algorithm for large-scale microarray datasets
Haosong Li
Phillip C.-Y. Sheu
[J]. Journal of Big Data, 9
[33] A scalable association rule learning and recommendation algorithm for large-scale microarray datasets
Li, Haosong
Sheu, Phillip C-Y
[J]. JOURNAL OF BIG DATA, 2022, 9 (01)
[34] Scalable Clustering for Mining Local-Correlated Clusters in High Dimensions and Large Datasets
Lu, Kun-Che
Yang, Don-Lin
[J]. FUNDAMENTA INFORMATICAE, 2010, 98 (01) : 15 - 32
[35] SWIFT-Scalable Clustering for Automated Identification of Rare Cell Populations in Large, High-Dimensional Flow Cytometry Datasets, Part 1: Algorithm Design
Naim, Iftekhar
Datta, Suprakash
Rebhahn, Jonathan
Cavenaugh, James S.
Mosmann, Tim R.
Sharma, Gaurav
[J]. CYTOMETRY PART A, 2014, 85 (05) : 408 - 421
[36] MPRK Algorithm for Clustering the Large Text Datasets
Thangarasu, M.
Inbarani, H. Hannah
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER APPLICATIONS (ICACA), 2016, : 224 - 229
[37] Scalable algorithms for clustering large datasets with mixed type attributes
He, ZY
Xu, XF
Deng, SC
[J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2005, 20 (10) : 1077 - 1089
[38] Semisupervised clustering algorithm combining SUBCLU and constrained clustering for detecting groups in high dimensional datasets
Alexander Calvo-Valverde, Luis
Vallejos-Pena, Alonso
[J]. TECNOLOGIA EN MARCHA, 2018, 31 (03): : 74 - 85
[39] A Fast Clustering Algorithm for Modularization of Large-Scale Software Systems
Teymourian, Navid
Izadkhah, Habib
Isazadeh, Ayaz
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (04) : 1451 - 1462
[40] Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm
Grotkjær, T
Winther, O
Regenberg, B
Nielsen, J
Hansen, LK
[J]. BIOINFORMATICS, 2006, 22 (01) : 58 - 67

← 1 2 3 4 5 →