Clustering in very large databases based on distance and density

被引：0

作者：

Weining Qian

XueQing Gong

AoYing Zhou

机构：

[1] Fudan University,Department of Computer Science and Engineering, The Laboratory for Intelligent Information Processing

来源：

Journal of Computer Science and Technology | 2003年 / 18卷

关键词：

data mining; very large database; clustering;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Clustering in very large databases or data warehouses, with many applications in areas such as spatial computation, web information collection, pattern recognition and economic analysis, is a huge task that challenges data mining researches. Current clustering methods always have the problems: 1) scanning the whole database leads to high I/O cost and expensive maintenance (e.g.,R*-tree); 2) pre-specifying the uncertain parameterk, with which clustering can only be refined by trial and test many times; 3) lacking high efficiency in treating arbitrary shape under very large data set environment. In this paper, we first present a new hybrid-clustering algorithm to solve these problems. This new algorithm, which combines both distance and density strategies, can handle any arbitrary shape clusters effectively. It makes full use of statistics information in mining to reduce the time complexity greatly while keeping good clustering quality. Furthermore, this algorithm can easily eliminate noises and identify outliers. An experimental evaluation is performed on a spatial database with this method and other popular clustering algorithms (CURE and DBSCAN). The results show that our algorithm outperforms them in terms of efficiency and cost, and even gets much more speedup as the data size scales up much larger.

引用

页码：67 / 76

页数：9

共 50 条

[1] Clustering in very large databases based on distance and density
Qian, WN
Gong, XQ
Zhou, AY
[J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2003, 18 (01) : 67 - 76
[2] An efficient density based clustering algorithm for large databases
El-Sonbaty, Y
Ismail, MA
Farouk, M
[J]. ICTAI 2004: 16TH IEEE INTERNATIONALCONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, : 673 - 677
[3] WIDE: Clustering algorithm for very large databases
School of Electronic Information Engineering, Tianjin University, Tianjin 300072, China
[J]. Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban), 2006, 7 (826-831):
[4] Clustering and validation for very large databases (VLDB)
Momin, Bashirahamad Fardin
[J]. 2006 INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2007, : 258 - 263
[5] A fast density-based clustering algorithm for large databases
Liu, Bing
[J]. PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 996 - 1000
[6] Short documents clustering in very large text databases
Wang, Yongheng
Jia, Yan
Yang, Shuqiang
[J]. WEB INFORMATION SYSTEMS - WISE 2006 WORKSHOPS, PROCEEDINGS, 2006, 4256 : 83 - 93
[7] Hybridized Fragmentation of Very Large Databases Using Clustering
Harikumar, Sandhya
Ramachandran, Raji
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, INFORMATICS, COMMUNICATION AND ENERGY SYSTEMS (SPICES), 2015,
[8] Scalable grid-based clustering algorithm for very large spatial databases
Sun, Yufen
Lu, Yansheng
[J]. 2006 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PTS 1 AND 2, PROCEEDINGS, 2006, : 763 - 768
[9] WaveCluster: a wavelet-based clustering approach for spatial data in very large databases
Gholamhosein Sheikholeslami
Surojit Chatterjee
Aidong Zhang
[J]. The VLDB Journal, 2000, 8 : 289 - 304
[10] WINP: A window-based incremental and parallel clustering algorithm for very large databases
Qiang, Z
Zheng, Z
Wei, SZ
Daley, E
[J]. ICTAI 2005: 17TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, : 169 - 176

← 1 2 3 4 5 →