A clustering framework for unbalanced partitioning and outlier filtering on high dimensional datasets

被引:0
|
作者
Bilgin, Turgay Tugay [1 ]
Camurcu, A. Yilmaz [2 ]
机构
[1] Maltepe Univ, Dept Comp Engn, Istanbul, Turkey
[2] Marmara Univ, Dept Elect & Comp Educ, Istanbul, Turkey
关键词
data mining; dimensionality; clustering; outlier filtering;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this study, we propose a better relationship based clustering framework for dealing with unbalanced clustering and outlier filtering on high dimensional datasets. Original relationship based clustering framework is based on a weighted graph partitioning system named METIS. However, it has two major drawbacks: no outlier filtering and forcing clusters to be balanced. Our proposed framework uses Graclus, an unbalanced kernel k-means based partitioning system. We have two major improvements over the original framework: First, we introduce a new space. It consists of tiny unbalanced partitions created using Graclus, hence we call it micro-partition space. We use a filtering approach to drop out singletons or micro-partitions that have fewer members than a threshold value. Second, we agglomerate the filtered micro-partition space and apply Graclus again for clustering. The visualization of the results has been carried out by CLUSION. Our experiments have shown that our proposed framework produces promising results on high dimensional datasets.
引用
收藏
页码:205 / +
页数:2
相关论文
共 50 条
  • [1] A modified relationship based clustering framework for density based clustering and outlier filtering on high dimensional datasets
    Bilgin, Turgay Tugay
    Camurcu, A. Yilmaz
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 409 - +
  • [2] A general framework for clustering high-dimensional datasets
    Zhao, YC
    Junde, S
    [J]. CCECE 2003: CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-3, PROCEEDINGS: TOWARD A CARING AND HUMANE TECHNOLOGY, 2003, : 1091 - 1094
  • [3] Novel Agglomerative Partitioning Framework for Dimension Reduction of High-Dimensional Genomic Datasets
    Millstein, Joshua
    Thomas, Duncan
    Yu, Yang
    Cozen, Wendy
    [J]. GENETIC EPIDEMIOLOGY, 2017, 41 (07) : 653 - 653
  • [4] High-dimensional clustering: a clique-based hypergraph partitioning framework
    Hu, Tianming
    Liu, Chuanren
    Tang, Yong
    Sun, Jing
    Xiong, Hui
    Sung, Sam Yuan
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 39 (01) : 61 - 88
  • [5] High-dimensional clustering: a clique-based hypergraph partitioning framework
    Tianming Hu
    Chuanren Liu
    Yong Tang
    Jing Sun
    Hui Xiong
    Sam Yuan Sung
    [J]. Knowledge and Information Systems, 2014, 39 : 61 - 88
  • [6] DB-Outlier detection by example in high dimensional datasets
    Li, Yuan
    Kitagawa, Hiroyuki
    [J]. 2007 IEEE INTERNATIONAL WORKSHOP ON DATABASES FOR NEXT GENERATION RESEARCHERS, 2007, : 73 - +
  • [7] Clustering high dimensional massive scientific datasets
    Otoo, EJ
    Shoshani, A
    Hwang, S
    [J]. THIRTEENTH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2001, : 147 - 157
  • [8] Clustering high dimensional massive scientific datasets
    Otoo, EJ
    Shoshani, A
    Hwang, SW
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2001, 17 (2-3) : 147 - 168
  • [9] Clustering High Dimensional Massive Scientific Datasets
    Ekow J. Otoo
    Arie Shoshani
    Seung-Won Hwang
    [J]. Journal of Intelligent Information Systems, 2001, 17 : 147 - 168
  • [10] Example-based robust outlier detection in high dimensional datasets
    Zhu, C
    Kitagawa, H
    Faloutsos, C
    [J]. FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, : 829 - 832