Genetic Programming for Evolving Similarity Functions for Clustering: Representations and Analysis

被引:16
|
作者
Lensen, Andrew [1 ]
Xue, Bing [1 ]
Zhang, Mengjie [1 ]
机构
[1] Victoria Univ Wellington, Evolutionary Computat Res Grp, Wellington 6140, New Zealand
关键词
Cluster analysis; automatic clustering; genetic programming; similarity function; feature selection; feature construction; FEATURE-SELECTION; MEMETIC ALGORITHM;
D O I
10.1162/evco_a_00264
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is a difficult and widely studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g., Euclidean distance) to decide which instances to assign to the same cluster. These similarity measures are generally predefined and cannot be easily tailored to the properties of a particular dataset, which leads to limitations in the quality and the interpretability of the clusters produced. In this article, we propose a new approach to automatically evolving similarity functions for a given clustering algorithm by using genetic programming. We introduce a new genetic programming-based method which automatically selects a small subset of features (feature selection) and then combines them using a variety of functions (feature construction) to produce dynamic and flexible similarity functions that are specifically designed for a given dataset. We demonstrate how the evolved similarity functions can be used to perform clustering using a graph-based representation. The results of a variety of experiments across a range of large, high-dimensional datasets show that the proposed approach can achieve higher and more consistent performance than the benchmark methods. We further extend the proposed approach to automatically produce multiple complementary similarity functions by using a multi-tree approach, which gives further performance improvements. We also analyse the interpretability and structure of the automatically evolved similarity functions to provide insight into how and why they are superior to standard distance metrics.
引用
收藏
页码:531 / 561
页数:31
相关论文
共 50 条
  • [21] Evolving defence strategies by genetic programming
    Jackson, D
    GENETIC PROGRAMMING, PROCEEDINGS, 2005, 3447 : 281 - 290
  • [22] Evolving Frame Splitters by Genetic Programming
    Xie, Feng
    Song, Andy
    2012 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2012,
  • [23] Evolving Distributed Algorithms With Genetic Programming
    Weise, Thomas
    Tang, Ke
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2012, 16 (02) : 242 - 265
  • [24] Evolving texture features by Genetic Programming
    Aurnhammer, Melanie
    Applications of Evolutionary Computing, Proceedings, 2007, 4448 : 351 - 358
  • [25] Evolving a learning machine by genetic programming
    Alfaro-Cid, Eva
    Sharman, Ken
    Esparcia-Alcazar, Anna I.
    2006 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-6, 2006, : 254 - +
  • [26] A Study on Graph Representations for Genetic Programming
    Sotto, Leo Francoso D. P.
    Kaufmann, Paul
    Atkinson, Timothy
    Kalkreuth, Roman
    Basgalupp, Marcio Porto
    GECCO'20: PROCEEDINGS OF THE 2020 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2020, : 931 - 939
  • [27] A Genetic Programming Approach for Evolving Variable Selectors in Constraint Programming
    Nguyen, Su
    Thiruvady, Dhananjay
    Zhang, Mengjie
    Tan, Kay Chen
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2021, 25 (03) : 492 - 507
  • [28] A schema theory analysis of the evolution of size in genetic programming with linear representations
    McPhee, NF
    Poli, R
    GENETIC PROGRAMMING, PROCEEDINGS, 2001, 2038 : 108 - 125
  • [29] A Discriminative Framework for Clustering via Similarity Functions
    Balcan, Maria-Florina
    Blum, Avrim
    Vempala, Santosh
    STOC'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL SYMPOSIUM ON THEORY OF COMPUTING, 2008, : 671 - +
  • [30] Evolving Aggressive Biomechanical Models with Genetic Programming
    Theodoridis, Theodoros
    Theodorakopoulos, Panos
    Hu, Huosheng
    IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010,