Hierarchical data generator based on tree-structured stick breaking process for benchmarking clustering methods

被引:4
|
作者
Olech, Lukasz P. [1 ,2 ]
Spytkowski, Michal [1 ]
Kwasnicka, Halina [1 ]
Michalewicz, Zbigniew [2 ,3 ,4 ]
机构
[1] Wroclaw Univ Sci & Technol, Dept Computat Intelligence, Wybrzee Stanislawa Wyspianskiego 27, PL-50370 Wroclaw, Poland
[2] Complex Pty Ltd, 155 Brebner Dr, West Lakes, SA 5021, Australia
[3] Polish Acad Sci, Inst Comp Sci, Ul Ordona 21, PL-01237 Warsaw, Poland
[4] Polish Japanese Acad Informat Technol, Ul Koszykowa 86, PL-02008 Warsaw, Poland
关键词
Artificial Data; Benchmark Data; Benchmark Data Generator; Hierarchical Clustering; Object Cluster Hierarchy; Tree-Structured Stick Breaking Process; Clustering Evaluation; Cluster Analysis; ALGORITHMS; MODEL;
D O I
10.1016/j.ins.2020.12.020
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A new variant of Hierarchical Cluster Analysis is gaining interest in the field of Machine Learning, called Object Cluster Hierarchy. Being still at an early stage of development, the lack of tools for systematic analysis of Object Cluster Hierarchies inhibits further improvement of this concept. In this paper we address this issue by proposing a generator of synthetic hierarchical data that can be used for benchmarking Object Cluster Hierarchy generation methods. The article presents a thorough empirical and theoretical analysis of the generator and provides guidance on how to control its parameters. The conducted experiments show the usefulness of the data generator capable of producing a wide range of differently structured data. Furthermore, datasets that represent the most common types of hierarchies are generated and made available to the public for benchmarking, along with the developed generator (http://kio.pwr.edu.pl/?page_id=396) (C) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页码:99 / 119
页数:21
相关论文
共 45 条
  • [1] Clustering of Tree-structured Data
    Lu, Na
    Wu, Yidan
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 1210 - 1215
  • [2] Tree-Structured Hierarchical Dirichlet Process
    Alam, Md. Hijbul
    Peltonen, Jaakko
    Nummenmaa, Jyrki
    Jarvelin, Kalervo
    [J]. DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2019, 801 : 291 - 299
  • [3] Tree-structured Clustering for Continuous Data
    Huh, Myung-Hoe
    Yang, Kyung-Sook
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2005, 18 (03) : 661 - 671
  • [4] Tree-structured Clustering for Mixed Data
    Yang, Kyung-Sook
    Huh, Myung-Hoe
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2006, 19 (02) : 271 - 282
  • [5] Clustering Tree-Structured Data on Manifold
    Lu, Na
    Miao, Hongyu
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (10) : 1956 - 1968
  • [6] TREE-STRUCTURED METHODS FOR LONGITUDINAL DATA
    SEGAL, MR
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1992, 87 (418) : 407 - 418
  • [7] Spherical Tree-Structured SOM and Its Application to Hierarchical Clustering
    Yoshioka, Koki
    Dozono, Hiroshi
    [J]. APPLIED SYSTEM INNOVATION, 2022, 5 (04)
  • [8] Hashing Tree-Structured Data: Methods and Applications
    Tatikonda, Shirish
    Parthasarathy, Srinivasan
    [J]. 26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 429 - 440
  • [9] Evolution of Multiple Tree Structured Patterns from Tree-Structured Data Using Clustering
    Nagamine, Masatoshi
    Miyahara, Tetsuhiro
    Kuboyama, Tetsuji
    Ueda, Hiroaki
    Takahashi, Kenichi
    [J]. AI 2008: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2008, 5360 : 500 - +
  • [10] Tree-structured clustering methods for piecewise linear-transformation-based noise adaptation
    Zhang, ZP
    Sugimura, T
    Furui, S
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (09) : 2168 - 2176