Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis

被引:16
|
作者
Lensen, Andrew [1 ]
Xue, Bing [1 ]
Zhang, Mengjie [1 ]
机构
[1] Victoria Univ Wellington, Evolutionary Computat Res Grp, Wellington 6140, New Zealand
关键词
Cluster analysis; automatic clustering; genetic programming; similarity function; feature selection; feature construction; FEATURE-SELECTION; MEMETIC ALGORITHM;
D O I
10.1162/evco_a_00264
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is a difficult and widely studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g., Euclidean distance) to decide which instances to assign to the same cluster. These similarity measures are generally predefined and cannot be easily tailored to the properties of a particular dataset, which leads to limitations in the quality and the interpretability of the clusters produced. In this article, we propose a new approach to automatically evolving similarity functions for a given clustering algorithm by using genetic programming. We introduce a new genetic programming-based method which automatically selects a small subset of features (feature selection) and then combines them using a variety of functions (feature construction) to produce dynamic and flexible similarity functions that are specifically designed for a given dataset. We demonstrate how the evolved similarity functions can be used to perform clustering using a graph-based representation. The results of a variety of experiments across a range of large, high-dimensional datasets show that the proposed approach can achieve higher and more consistent performance than the benchmark methods. We further extend the proposed approach to automatically produce multiple complementary similarity functions by using a multi-tree approach, which gives further performance improvements. We also analyse the interpretability and structure of the automatically evolved similarity functions to provide insight into how and why they are superior to standard distance metrics.
引用
收藏
页码:531 / 561
页数:31
相关论文
共 50 条
  • [1] Genetic Programming for Evolving Similarity Functions Tailored to Clustering Algorithms
    Andersen, Hayden
    Lensen, Andrew
    Xue, Bing
    2021 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC 2021), 2021, : 688 - 695
  • [2] On the Analysis of Simple Genetic Programming for Evolving Boolean Functions
    Mambrini, Andrea
    Oliveto, Pietro S.
    GENETIC PROGRAMMING, EUROGP 2016, 2016, 9594 : 99 - 114
  • [3] Evolving kernel functions for SVMs by genetic programming
    Diosan, Laura
    Rogozan, Alexandrina
    Pecuchet, Jean-Pierre
    ICMLA 2007: SIXTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2007, : 19 - 24
  • [4] Evolving hash functions by means of genetic programming
    Estebanez, Cesar
    Cesar, Julio
    Ribagorda, Arturo
    GECCO 2006: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOL 1 AND 2, 2006, : 1861 - +
  • [5] Evolving Simpler Constructed Features for Clustering Problems with Genetic Programming
    Schofield, Finn
    Lensen, Andrew
    2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [6] Multimodal Genetic Programming by Using Tree Structure Similarity Clustering
    Yoshida, Shubu
    Harada, Tomohiro
    Thawonmas, Ruck
    2017 IEEE 10TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (IWCIA), 2017, : 85 - 90
  • [7] Evolving Boolean Functions with Conjunctions and Disjunctions via Genetic Programming
    Doerr, Benjamin
    Lissovoi, Andrei
    Oliveto, Pietro S.
    PROCEEDINGS OF THE 2019 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'19), 2019, : 1003 - 1011
  • [8] Learning similarity functions for binary strings via genetic programming
    Pebriadi, Muhammad Syahid
    Dewanto, Vektor
    Kusuma, Wisnu Ananta
    Afendi, Farit Mochamad
    Heryanto, Rudi
    2016 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2016, : 476 - 490
  • [9] On genetic programming representations and fitness functions for interpretable dimensionality reduction
    Uriot, Thomas
    Virgolin, Marco
    Alderliesten, Tanja
    Bosman, Peter A. N.
    PROCEEDINGS OF THE 2022 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'22), 2022, : 458 - 466
  • [10] Evolving meaning: Using genetic Programming to learn similarity perspectives for mining biomedical data
    Sousa, Rita
    Silva, Sara
    Pesquita, Catia
    EUROPEAN JOURNAL OF CLINICAL INVESTIGATION, 2019, 49 : 83 - 83