AutoClust: A Framework for Automated Clustering based on Cluster Validity Indices

被引:10
|
作者
Poulakis, Yannis [1 ]
Doulkeridis, Christos [1 ]
Kyriazis, Dimosthenis [1 ]
机构
[1] Univ Piraeus, Dept Digital Syst, Piraeus, Greece
基金
欧盟地平线“2020”;
关键词
automatic clustering; hyperparameter tuning; meta-learning; RANKING;
D O I
10.1109/ICDM50108.2020.00153
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automated machine learning (AutoML) aims to minimize human intervention during a machine learning task, for example by means of automatic algorithm selection and its configuration for the data set at hand. Although this research direction has attracted much interest lately, both in academia and industry, existing systems and tools mainly target the domain of supervised learning. However, unsupervised learning, in particular clustering, also calls for AutoML solutions, especially due to the ambiguity involved when evaluating clustering results. Motivated by this shortcoming, in this paper, we introduce a framework for automated clustering that encompasses two main modules: algorithm selection and hyperparameter tuning. Our approach to algorithm selection relies on meta-learning, based on novel meta-features extracted from data sets that attempt to capture similarities in the clustering structure. This approach is coupled with a method for hyperparameter tuning based on Bayesian optimization, where the main novelty is the proposal of an optimization goal that combines different cluster validity indices. We demonstrate the merits of our approach by empirical evaluation on 24 real-life data sets, which shows promising results when compared to existing methods.
引用
收藏
页码:1220 / 1225
页数:6
相关论文
共 50 条
  • [1] TPE-AutoClust: A Tree-based Pipline Ensemble Framework for Automated Clustering
    ElShawi, Radwa
    Sakr, Sherif
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW, 2022, : 1144 - 1153
  • [2] A new clustering algorithm based on cluster validity indices
    Kim, M
    Ramakrishna, RS
    [J]. DISCOVERY SCIENCE, PROCEEDINGS, 2004, 3245 : 322 - 329
  • [3] A Data Clustering Tool with Cluster Validity Indices
    Qiao, Haiyan
    Edwards, Brandon
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTING, ENGINEERING AND INFORMATION, 2009, : 303 - 309
  • [4] Two cluster validity indices for the LAMDA clustering method
    Botia Valderrama, Javier Fernando
    Luis Botia Valderrama, Diego Jose
    [J]. APPLIED SOFT COMPUTING, 2020, 89 (89)
  • [5] Particle Swarm Optimization Based Clustering: A Comparison of Different Cluster Validity Indices
    Liu, Ruochen
    Sun, Xiaojuan
    Jiao, Licheng
    [J]. LIFE SYSTEM MODELING AND INTELLIGENT COMPUTING, PT II, 2010, 98 : 66 - 72
  • [6] Online cluster validity indices for performance monitoring of streaming data clustering
    Moshtaghi, Masud
    Bezdek, James C.
    Erfani, Sarah M.
    Leckie, Christopher
    Bailey, James
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2019, 34 (04) : 541 - 563
  • [7] Some connectivity based cluster validity indices
    Saha, Sriparna
    Bandyopadhyay, Sanghamitra
    [J]. APPLIED SOFT COMPUTING, 2012, 12 (05) : 1555 - 1565
  • [8] A comparison study of cluster validity indices using a nonhierarchical clustering algorithm
    Shim, Yosung
    Chung, Jiwon
    Choi, In-Chan
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION JOINTLY WITH INTERNATIONAL CONFERENCE ON INTELLIGENT AGENTS, WEB TECHNOLOGIES & INTERNET COMMERCE, VOL 1, PROCEEDINGS, 2006, : 199 - +
  • [9] Word clustering with validity indices
    El Sayed, Ahmad
    Velcin, Julien
    Zighed, Djamel
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, 2008, 5032 : 259 - 270
  • [10] A survey of cluster validity indices for automatic data clustering using differential evolution
    Jose-Garcia, Adan
    Gomez-Flores, Wilfrido
    [J]. PROCEEDINGS OF THE 2021 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'21), 2021, : 314 - 322