A Novel Approach for Clustering High-Dimensional Data using Kernel Hubness

被引:0
|
作者
Amina, M. [1 ]
Farook, Syed K. [1 ]
机构
[1] MES Coll Engn, Comp Sci & Engn Dept, Kuttippuram, Kerala, India
关键词
Clustering; High dimensional clustering; Hub based clustering; Kernal;
D O I
10.1109/ICACC.2015.67
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering of high dimensionality data which can be seen in almost all fields these days is becoming very tedious process. The key disadvantage of high dimensional data which we can pen down is curse of dimensionality. As the magnitude of datasets grows the data points become sparse and density of area becomes less making it difficult to cluster that data which further reduces the performance of traditional algorithms used for clustering. To route these toils, hubness based algorithms were introduced. These algorithms which influences the distribution of the data points among the k-nearest neighbor. The hubness is an unguided method which finds out which points appear more frequently in the k-nearest neighbor than other points in the dataset. Mainly three algorithms are used for hub based clustering such as K-hubs, Hubness proportional clustering and Hubness proportional K-means. K-hubs algorithm is used to initialize the hubs for the clusters. Hubness Proportional Clustering (HPC) algorithm is used group the probabilistic data models. Hubness Proportional K-Means (HPKM) algorithm integrates the hubness based centroid selection and partitioning process. These algorithms are basically used for increasing the efficiency and increasing predicting accuracy of the system. The main drawback of in this method is number of iteration increasing with dimensionality is increased. To overcome this drawback a new algorithm is proposed which is based on the combination of kernel mapping and hubness phenomenon. The proposed algorithm detects arbitrary shaped clusters in the dataset and also improves the performance of clustering by minimizing the intra-cluster distance and maximizing the inter-cluster distance which improves the cluster quality.
引用
收藏
页码:94 / 97
页数:4
相关论文
共 50 条
  • [31] A novel attribute weighting algorithm for clustering high-dimensional categorical data
    Bai, Liang
    Liang, Jiye
    Dang, Chuangyin
    Cao, Fuyuan
    PATTERN RECOGNITION, 2011, 44 (12) : 2843 - 2861
  • [32] A kernel-based approach for detecting outliers of high-dimensional biological data
    Jung Hun Oh
    Jean Gao
    BMC Bioinformatics, 10
  • [33] A kernel-based approach for detecting outliers of high-dimensional biological data
    Oh, Jung Hun
    Gao, Jean
    BMC BIOINFORMATICS, 2009, 10
  • [34] Hubness-aware kNN classification of high-dimensional data in presence of label noise
    Tomasev, Nenad
    Buza, Krisztian
    NEUROCOMPUTING, 2015, 160 : 157 - 172
  • [35] An effective clustering scheme for high-dimensional data
    He, Xuansen
    He, Fan
    Fan, Yueping
    Jiang, Lingmin
    Liu, Runzong
    Maalla, Allam
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 45001 - 45045
  • [36] Approximated clustering of distributed high-dimensional data
    Kriegel, HP
    Kunath, P
    Pfeifle, M
    Renz, M
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 432 - 441
  • [37] Clustering High-Dimensional Noisy Categorical Data
    Tian, Zhiyi
    Xu, Jiaming
    Tang, Jen
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024,
  • [38] Subspace selection for clustering high-dimensional data
    Baumgartner, C
    Plant, C
    Kailing, K
    Kriegel, HP
    Kröger, P
    FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 11 - 18
  • [39] An Initialization Method for Clustering High-Dimensional Data
    Chen, Luying
    Chen, Lifei
    Jiang, Qingshan
    Wang, Beizhan
    Shi, Liang
    FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 444 - +
  • [40] Clustering of imbalanced high-dimensional media data
    Brodinova, Sarka
    Zaharieva, Maia
    Filzmoser, Peter
    Ortner, Thomas
    Breiteneder, Christian
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2018, 12 (02) : 261 - 284