A Fast and More Accurate Seed-and-Extension Density-Based Clustering Algorithm

被引:0
|
作者
Tung, Ming-Hao [1 ]
Chen, Yi-Ping Phoebe [2 ]
Liu, Chen-Yu [3 ]
Liao, Chung-Shou [4 ]
机构
[1] Micron Technol Inc, Res & Dev, Hsinchu, Taiwan
[2] La Trobe Univ, Dept Comp Sci & Informat Technol, Melbourne, Australia
[3] Natl Tsing Hua Univ, Dept Ind Engn & Engn Management, Hsinchu, Taiwan
[4] Natl Tsing Hua Univ, Ind Engn & Engn Management, Hsinchu, Taiwan
关键词
Clustering algorithms; Heuristic algorithms; Partitioning algorithms; Forestry; Machine learning algorithms; Shape; Numerical models; Center selection; density peaks; seed-and-extension; spanning tree; clustering;
D O I
10.1109/TKDE.2022.3161117
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering algorithms have been widely studied in many scientific areas, such as data mining, knowledge discovery, bioinformatics and machine learning. A density-based clustering algorithm, called density peaks (DP), which was proposed by Rodriguez and Laio, outperforms almost all other approaches. Although the DP algorithm performs well in many cases, there is still room for improvement in the precision of its output clusters as well as the quality of the selected centers. In this study, we propose a more accurate clustering algorithm, seed-and-extension-based density peaks (SDP). SDP selects the centers that hold the features of their clusters while building a spanning forest, and meanwhile, constructs the output clusters in a seed-and-extension manner. Experiment results demonstrate the effectiveness of SDP, especially when dealing with clusters with relatively high densities. Precisely, we show that SDP is more accurate than the DP algorithm as well as other state-of-the-art clustering approaches concerning the quality of both output clusters and cluster centers while maintaining similar running time of the DP algorithm, particularly for a variety of time-series data. Moreover, SDP outperforms DP in the dynamic model in which data point insertion and deletion are allowed. From a practical perspective, the proposed SDP algorithm is obviously helpful to many application problems.
引用
收藏
页码:5458 / 5471
页数:14
相关论文
共 50 条
  • [1] Fast density-based clustering algorithm
    Zhou, Shuigeng
    Zhou, Aoying
    Cao, Jing
    Hu, Yunfa
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2000, 37 (11): : 1287 - 1292
  • [2] A fast density-based clustering algorithm for large databases
    Liu, Bing
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 996 - 1000
  • [3] A varied density-based clustering algorithm
    Fahim, Ahmed
    JOURNAL OF COMPUTATIONAL SCIENCE, 2023, 66
  • [4] Fast density estimation for density-based clustering methods
    Cheng, Difei
    Xu, Ruihang
    Zhang, Bo
    Jin, Ruinan
    NEUROCOMPUTING, 2023, 532 : 170 - 182
  • [5] dbscan: Fast Density-Based Clustering with R
    Hahsler, Michael
    Piekenbrock, Matthew
    Doran, Derek
    JOURNAL OF STATISTICAL SOFTWARE, 2019, 91 (01): : 1 - 30
  • [6] A Fast Algorithm for Identifying Density-Based Clustering Structures Using a Constraint Graph
    Kim, Jeong-Hun
    Choi, Jong-Hyeok
    Yoo, Kwan-Hee
    Loh, Woong-Kee
    Nasridinov, Aziz
    ELECTRONICS, 2019, 8 (10)
  • [7] An Efficient Density-Based Algorithm for Data Clustering
    Theljani, Foued
    Laabidi, Kaouther
    Zidi, Salah
    Ksouri, Moufida
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2017, 26 (04)
  • [8] TOBAE: A Density-based Agglomerative Clustering Algorithm
    Shehzad Khalid
    Shahid Razzaq
    Journal of Classification, 2015, 32 : 241 - 267
  • [9] GrDBSCAN: A Granular Density-Based Clustering Algorithm
    Suchy, Dawid
    Siminski, Krzysztof
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2023, 33 (02) : 297 - 312
  • [10] EFFICIENT DENSITY-BASED PARTITIONAL CLUSTERING ALGORITHM
    Alamgir, Zareen
    Naveed, Hina
    COMPUTING AND INFORMATICS, 2021, 40 (06) : 1322 - 1344