Clustering in very large databases based on distance and density

被引:0
|
作者
Weining Qian
XueQing Gong
AoYing Zhou
机构
[1] Fudan University,Department of Computer Science and Engineering, The Laboratory for Intelligent Information Processing
关键词
data mining; very large database; clustering;
D O I
暂无
中图分类号
学科分类号
摘要
Clustering in very large databases or data warehouses, with many applications in areas such as spatial computation, web information collection, pattern recognition and economic analysis, is a huge task that challenges data mining researches. Current clustering methods always have the problems: 1) scanning the whole database leads to high I/O cost and expensive maintenance (e.g.,R*-tree); 2) pre-specifying the uncertain parameterk, with which clustering can only be refined by trial and test many times; 3) lacking high efficiency in treating arbitrary shape under very large data set environment. In this paper, we first present a new hybrid-clustering algorithm to solve these problems. This new algorithm, which combines both distance and density strategies, can handle any arbitrary shape clusters effectively. It makes full use of statistics information in mining to reduce the time complexity greatly while keeping good clustering quality. Furthermore, this algorithm can easily eliminate noises and identify outliers. An experimental evaluation is performed on a spatial database with this method and other popular clustering algorithms (CURE and DBSCAN). The results show that our algorithm outperforms them in terms of efficiency and cost, and even gets much more speedup as the data size scales up much larger.
引用
收藏
页码:67 / 76
页数:9
相关论文
共 50 条
  • [1] Clustering in very large databases based on distance and density
    Qian, WN
    Gong, XQ
    Zhou, AY
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2003, 18 (01) : 67 - 76
  • [2] An efficient density based clustering algorithm for large databases
    El-Sonbaty, Y
    Ismail, MA
    Farouk, M
    [J]. ICTAI 2004: 16TH IEEE INTERNATIONALCONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, : 673 - 677
  • [3] WIDE: Clustering algorithm for very large databases
    School of Electronic Information Engineering, Tianjin University, Tianjin 300072, China
    [J]. Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban), 2006, 7 (826-831):
  • [4] Clustering and validation for very large databases (VLDB)
    Momin, Bashirahamad Fardin
    [J]. 2006 INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2007, : 258 - 263
  • [5] A fast density-based clustering algorithm for large databases
    Liu, Bing
    [J]. PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 996 - 1000
  • [6] Short documents clustering in very large text databases
    Wang, Yongheng
    Jia, Yan
    Yang, Shuqiang
    [J]. WEB INFORMATION SYSTEMS - WISE 2006 WORKSHOPS, PROCEEDINGS, 2006, 4256 : 83 - 93
  • [7] Hybridized Fragmentation of Very Large Databases Using Clustering
    Harikumar, Sandhya
    Ramachandran, Raji
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, INFORMATICS, COMMUNICATION AND ENERGY SYSTEMS (SPICES), 2015,
  • [8] Scalable grid-based clustering algorithm for very large spatial databases
    Sun, Yufen
    Lu, Yansheng
    [J]. 2006 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PTS 1 AND 2, PROCEEDINGS, 2006, : 763 - 768
  • [9] WaveCluster: a wavelet-based clustering approach for spatial data in very large databases
    Gholamhosein Sheikholeslami
    Surojit Chatterjee
    Aidong Zhang
    [J]. The VLDB Journal, 2000, 8 : 289 - 304
  • [10] WINP: A window-based incremental and parallel clustering algorithm for very large databases
    Qiang, Z
    Zheng, Z
    Wei, SZ
    Daley, E
    [J]. ICTAI 2005: 17TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, : 169 - 176