A Novel Approach for Clustering High-Dimensional Data using Kernel Hubness

被引:0
|
作者
Amina, M. [1 ]
Farook, Syed K. [1 ]
机构
[1] MES Coll Engn, Comp Sci & Engn Dept, Kuttippuram, Kerala, India
关键词
Clustering; High dimensional clustering; Hub based clustering; Kernal;
D O I
10.1109/ICACC.2015.67
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering of high dimensionality data which can be seen in almost all fields these days is becoming very tedious process. The key disadvantage of high dimensional data which we can pen down is curse of dimensionality. As the magnitude of datasets grows the data points become sparse and density of area becomes less making it difficult to cluster that data which further reduces the performance of traditional algorithms used for clustering. To route these toils, hubness based algorithms were introduced. These algorithms which influences the distribution of the data points among the k-nearest neighbor. The hubness is an unguided method which finds out which points appear more frequently in the k-nearest neighbor than other points in the dataset. Mainly three algorithms are used for hub based clustering such as K-hubs, Hubness proportional clustering and Hubness proportional K-means. K-hubs algorithm is used to initialize the hubs for the clusters. Hubness Proportional Clustering (HPC) algorithm is used group the probabilistic data models. Hubness Proportional K-Means (HPKM) algorithm integrates the hubness based centroid selection and partitioning process. These algorithms are basically used for increasing the efficiency and increasing predicting accuracy of the system. The main drawback of in this method is number of iteration increasing with dimensionality is increased. To overcome this drawback a new algorithm is proposed which is based on the combination of kernel mapping and hubness phenomenon. The proposed algorithm detects arbitrary shaped clusters in the dataset and also improves the performance of clustering by minimizing the intra-cluster distance and maximizing the inter-cluster distance which improves the cluster quality.
引用
收藏
页码:94 / 97
页数:4
相关论文
共 50 条
  • [1] The Role of Hubness in Clustering High-Dimensional Data
    Tomasev, Nenad
    Radovanovic, Milos
    Mladenic, Dunja
    Ivanovic, Mirjana
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (03) : 739 - 751
  • [2] The Role of Hubness in Clustering High-Dimensional Data
    Tomasev, Nenad
    Radovanovic, Milos
    Mladenic, Dunja
    Ivanovic, Mirjana
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 183 - 195
  • [3] Clustering High-Dimensional Data using AE-Hubness
    Xu Yang
    2020 INTERNATIONAL CONFERENCE ON BIG DATA & ARTIFICIAL INTELLIGENCE & SOFTWARE ENGINEERING (ICBASE 2020), 2020, : 90 - 93
  • [4] The role of hubness in high-dimensional data analysis
    Tomašev, Nenad
    Informatica (Slovenia), 2014, 38 (04): : 387 - 388
  • [5] The Role Of Hubness in High-dimensional Data Analysis
    Tomasev, Nenad
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2014, 38 (04): : 387 - 388
  • [6] Clustering High-Dimensional Stock Data using Data Mining Approach
    Indriyanti, Dhea
    Dhini, Arian
    2019 16TH INTERNATIONAL CONFERENCE ON SERVICE SYSTEMS AND SERVICE MANAGEMENT (ICSSSM2019), 2019,
  • [7] Fast approximate hubness reduction for large high-dimensional data
    Feldbauer, Roman
    Leodolter, Maximilian
    Plant, Claudia
    Flexer, Arthur
    2018 9TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK), 2018, : 358 - 367
  • [8] Subspace Clustering of High-Dimensional Data: An Evolutionary Approach
    Vijendra, Singh
    Laxman, Sahoo
    APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2013, 2013
  • [9] Subspace clustering of high-dimensional data: a predictive approach
    Brian McWilliams
    Giovanni Montana
    Data Mining and Knowledge Discovery, 2014, 28 : 736 - 772
  • [10] Subspace clustering of high-dimensional data: a predictive approach
    McWilliams, Brian
    Montana, Giovanni
    DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (03) : 736 - 772