A general framework for clustering high-dimensional datasets

被引:0
|
作者
Zhao, YC [1 ]
Junde, S [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing 100088, Peoples R China
关键词
data mining; clustering; high-dimensional;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many fields, the datasets used in data mining applications are usually of high dimensionality. Most existing algorithms of clustering are effective and efficient when the dimensionality is low, but their performance and effectiveness degrade when the data space is high-dimensional. One reason is that their complexity increases exponentially with the dimensionality. To solve the problem, we put forward a general framework for clustering high-dimensional datasets. Common clustering algorithms, when combined with our framework, can be applied to cluster high-dimensional datasets efficiently. In our framework, a high-dimensional clustering is broken into several one- or two-dimensional clustering phases. During each phase, only one or two dimensions are involved. In such a way, common algorithms for clustering low-dimensional datasets can be used to process high-dimensional ones. In addition, attributes of different types can be processed with different algorithms in separate phases and datasets of hybrid data types can be handled easily. The efficiency and effectiveness of our framework is proven in our experiments.
引用
收藏
页码:1091 / 1094
页数:4
相关论文
共 50 条
  • [1] Systematic Review of Clustering High-Dimensional and Large Datasets
    Pandove, Divya
    Goel, Shivani
    Rani, Rinkle
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (02)
  • [2] A clustering scheme for large high-dimensional document datasets
    Jiang, Jung-Yi
    Chen, Jing-Wen
    Lee, Shie-Jue
    [J]. ADVANCES IN COMPUTATION AND INTELLIGENCE, PROCEEDINGS, 2007, 4683 : 511 - 519
  • [3] AGRID: An efficient algorithm for clustering large high-dimensional datasets
    Zhao, YC
    Song, JD
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2003, 2637 : 271 - 282
  • [4] A framework for generalized subspace pattern mining in high-dimensional datasets
    Edward WJ Curry
    [J]. BMC Bioinformatics, 15
  • [5] A framework for generalized subspace pattern mining in high-dimensional datasets
    Curry, Edward W. J.
    [J]. BMC BIOINFORMATICS, 2014, 15
  • [6] A Framework for Efficient and Binary Clustering in High-Dimensional Space
    Hernandez-Cano, Alejandro
    Kim, Yeseong
    Imani, Mohsen
    [J]. PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1859 - 1864
  • [7] Parallel algorithms for clustering high-dimensional large-scale datasets
    Nagesh, H
    Goil, S
    Choudhary, A
    [J]. DATA MINING FOR SCIENTIFIC AND ENGINEERING APPLICATIONS, 2001, 2 : 335 - 356
  • [8] Improved Graph-Based Metrics for Clustering High-Dimensional Datasets
    Baya, Ariel E.
    Granitto, Pablo M.
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2010, 2010, 6433 : 184 - 193
  • [9] FUn: a framework for interactive visualizations of large, high-dimensional datasets on the web
    Probst, Daniel
    Reymond, Jean-Louis
    [J]. BIOINFORMATICS, 2018, 34 (08) : 1433 - 1435
  • [10] An Efficient Density Biased Sampling Algorithm for Clustering Large High-Dimensional Datasets
    Qian, Xue-Zhong
    Deng, Jie
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (08)