Synthetic Generation of High-Dimensional Datasets

被引:31
|
作者
Albuquerque, Georgia [1 ]
Loewe, Thomas [1 ]
Magnor, Marcus [1 ]
机构
[1] TU Braunschweig, Comp Graph Lab, Braunschweig, Germany
关键词
Synthetic data generation; multivariate data; high-dimensional data; interaction;
D O I
10.1109/TVCG.2011.237
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Generation of synthetic datasets is a common practice in many research areas. Such data is often generated to meet specific needs or certain conditions that may not be easily found in the original, real data. The nature of the data varies according to the application area and includes text, graphs, social or weather data, among many others. The common process to create such synthetic datasets is to implement small scripts or programs, restricted to small problems or to a specific application. In this paper we propose a framework designed to generate high dimensional datasets. Users can interactively create and navigate through multi dimensional datasets using a suitable graphical user-interface. The data creation is driven by statistical distributions based on a few user-defined parameters. First, a grounding dataset is created according to given inputs, and then structures and trends are included in selected dimensions and orthogonal projection planes. Furthermore, our framework supports the creation of complex non-orthogonal trends and classified datasets. It can successfully be used to create synthetic datasets simulating important trends as multidimensional clusters, correlations and outliers.
引用
收藏
页码:2317 / 2324
页数:8
相关论文
共 50 条
  • [1] Joining massive high-dimensional datasets
    Kahveci, T
    Lang, CA
    Singh, AK
    [J]. 19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, : 265 - 276
  • [2] Cluster validation for high-dimensional datasets
    Kim, M
    Yoo, H
    Ramakrishna, RS
    [J]. ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS, PROCEEDINGS, 2004, 3192 : 178 - 187
  • [3] Visual terrain analysis of high-dimensional datasets
    Li, W
    Ong, KL
    Ng, WK
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2005, 2005, 3721 : 593 - 600
  • [4] Pattern discovery for high-dimensional binary datasets
    Snasel, Vaclav
    Moravec, Pavel
    Husek, Dusan
    Frolov, Alexander
    Rezankova, Hana
    Polyakov, Pavel
    [J]. NEURAL INFORMATION PROCESSING, PART I, 2008, 4984 : 861 - +
  • [5] Quantifying and comparing features in high-dimensional datasets
    Piringer, Harald
    Berger, Wolfgang
    Hauser, Helwig
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL INFORMATION VISUALISATION, 2008, : 240 - 245
  • [6] INTEGRATIVE EXPLORATION OF LARGE HIGH-DIMENSIONAL DATASETS
    Pardy, Christopher
    Galbraith, Sally
    Wilson, Susan R.
    [J]. ANNALS OF APPLIED STATISTICS, 2018, 12 (01): : 178 - 199
  • [7] An immune approach to classifying the high-dimensional datasets
    Chmielewski, Andrzej
    Wierzchon, Slawomir T.
    [J]. 2008 INTERNATIONAL MULTICONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (IMCSIT), VOLS 1 AND 2, 2008, : 79 - +
  • [8] Detecting Trivariate Associations in High-Dimensional Datasets
    Liu, Chuanlu
    Wang, Shuliang
    Yuan, Hanning
    Dang, Yingxu
    Liu, Xiaojia
    [J]. SENSORS, 2022, 22 (07)
  • [9] High-dimensional feature selection for genomic datasets
    Afshar, Majid
    Usefi, Hamid
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 206
  • [10] A general framework for clustering high-dimensional datasets
    Zhao, YC
    Junde, S
    [J]. CCECE 2003: CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, VOLS 1-3, PROCEEDINGS: TOWARD A CARING AND HUMANE TECHNOLOGY, 2003, : 1091 - 1094