A Hybrid and Parameter-Free Clustering Algorithm for Large Data Sets

被引:4
|
作者
Shao, Hengkang [1 ]
Zhang, Ping [1 ]
Chen, Xinye [1 ]
Li, Fang [1 ]
Du, Guanglong [1 ]
机构
[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510006, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Clustering on large data sets; sampling strategy; cluster tendency; clustering number; ENHANCED VISUAL ANALYSIS; TENDENCY; MEDOIDS;
D O I
10.1109/ACCESS.2019.2900260
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As an important unsupervised learning method, clustering can find the hidden structures in data effectively. With the amount of data grows larger, clustering of large data sets is a challenging task. Many clustering algorithms have been developed to deal with small data sets, but they are often inefficient when the data sets are large. Meanwhile, most clustering algorithms require some extra parameters as input, which may not be easy to obtain in practical applications. This paper proposed a new clustering algorithm called hybrid and parameter-free clustering method (HPFCM). HPFCM is able to rapidly perform clustering on large data sets without knowing the number of clusters in advance. HPFCM is based on sampling on large data sets (MMRS* sampling), assessing the clustering tendency on samples (eVAT), determining the number of clusters (EPB), forming different partitions (MST tree cutting), and extending the results to the rest of the data sets. We compare HPFCM with the other three methods, which are popular in clustering large data sets. Several numerical and real-world experiments have been conducted to verify our algorithm. The results show the great potential and effectiveness of HPFCM for clustering large data sets.
引用
收藏
页码:24806 / 24818
页数:13
相关论文
共 50 条
  • [1] A parameter-free hybrid clustering algorithm used for malware categorization
    Department of Computer Science, Xiamen University, Xiamen, China
    不详
    [J]. Int. Conf. Anti-counterfeiting, Secur., Identif. Commun., ASID, 1600, (480-483):
  • [2] A Parameter-Free Hybrid Clustering algorithm used for Malware Categorization
    Han, ZhiXue
    Feng, Shaorong
    Ye, Yanfang
    Jiang, Qingshan
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON ANTI-COUNTERFEITING, SECURITY, AND IDENTIFICATION IN COMMUNICATION, 2009, : 480 - +
  • [3] A Parameter-Free Clustering Algorithm Based on Density Model
    Mu, Jun
    Fei, Hongxiao
    Dong, Xin
    [J]. PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE FOR YOUNG COMPUTER SCIENTISTS, VOLS 1-5, 2008, : 1825 - 1831
  • [4] PFClust: an optimised implementation of a parameter-free clustering algorithm
    Musayeva, Khadija
    Henderson, Tristan
    Mitchell, John Bo
    Mavridis, Lazaros
    [J]. SOURCE CODE FOR BIOLOGY AND MEDICINE, 2014, 9 (01):
  • [5] DSets-DBSCAN: A Parameter-Free Clustering Algorithm
    Hou, Jian
    Gao, Huijun
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (07) : 3182 - 3193
  • [6] A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors
    Li, Junnan
    Zhu, Qingsheng
    Wu, Quanwang
    [J]. APPLIED INTELLIGENCE, 2020, 50 (05) : 1527 - 1541
  • [7] A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors
    Junnan Li
    Qingsheng Zhu
    Quanwang Wu
    [J]. Applied Intelligence, 2020, 50 : 1527 - 1541
  • [8] PARAMETER-FREE CLUSTERING MODEL
    GITMAN, I
    [J]. PATTERN RECOGNITION, 1972, 4 (03) : 307 - &
  • [9] A Parameter-free Clustering Algorithm based K-means
    Slaoui, Said
    Dafir, Zineb
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (03) : 612 - 619
  • [10] Towards Parameter-Free Clustering for Real-World Data
    Hou, Jian
    Yuan, Huaqiang
    Pelillo, Marcello
    [J]. PATTERN RECOGNITION, 2023, 134