DataGen: a generator of datasets for evaluation of classification algorithms

被引:13
|
作者
Rachkovskij, DA [1 ]
Kussul, EM [1 ]
机构
[1] Ukrainian Acad Sci, Cybernet Ctr, UA-252650 Kiev, Ukraine
关键词
benchmarking; evaluation; classification; supervised learning; datasets; data generator; synthetic data;
D O I
10.1016/S0167-8655(98)00053-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dataset generators are useful for the evaluation of an algorithm's performance because they allow control of the characteristics and amount of data used for benchmarking. We propose a dataset generator called DataGen that allows varying the number of input features and output classes, the complexity and realizations of class regions, the distributions of data samples, the noise level, the number of data samples. A C language listing of basic DataCen version is provided. (C) 1998 Published by Elsevier Science B.V. All rights reserved.
引用
收藏
页码:537 / 544
页数:8
相关论文
共 50 条
  • [21] ON THE CLASSIFICATION OF ATTRIBUTE EVALUATION ALGORITHMS
    MARCELIS, AJJM
    SCIENCE OF COMPUTER PROGRAMMING, 1990, 14 (01) : 1 - 24
  • [22] Evaluation of Different Algorithms for Measuring the Similarities of Trajectory Datasets
    Savas, Nurullah Samed
    Bakkal, Fuat
    Eken, Suleyman
    Sayar, Ahmet
    2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [23] Evaluation of Face Recognition Algorithms on Avatar Face Datasets
    Yampolskiy, Roman V.
    Cho, Gyuchoon
    Rosenthal, Richard
    Gavrilova, Marina L.
    2011 INTERNATIONAL CONFERENCE ON CYBERWORLDS, 2011, : 93 - 99
  • [24] Generator of Synthetic Datasets for Hierarchical Sequential Pattern Mining Evaluation
    Sebek, Michal
    Zendulka, Jaroslav
    INFORMATICS 2013: PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE ON INFORMATICS, 2013, : 289 - 292
  • [25] Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms
    Jinyan Li
    Simon Fong
    Sabah Mohammed
    Jinan Fiaidhi
    The Journal of Supercomputing, 2016, 72 : 3708 - 3728
  • [26] Analysis of Multiobjective Algorithms for the Classification of Multi-Label Video Datasets
    Karagoz, Gizem Nur
    Yazici, Adnan
    Dokeroglu, Tansel
    Cosar, Ahmet
    IEEE ACCESS, 2020, 8 : 163937 - 163952
  • [27] Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms
    Li, Jinyan
    Fong, Simon
    Mohammed, Sabah
    Fiaidhi, Jinan
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (10): : 3708 - 3728
  • [28] A Comparative Study of Anemia Classification Algorithms for International and Newly CBC Datasets
    Abdul-Jabbar, Safa S.
    Farhan, Alaa K.
    Luchinin, Alexander S.
    INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2023, 19 (06) : 141 - 157
  • [29] Comparative Analysis of Classification Algorithms on Three Different Datasets using WEKA
    Duriqi, Rafet
    Raca, Vigan
    Cico, Betim
    2016 5TH MEDITERRANEAN CONFERENCE ON EMBEDDED COMPUTING (MECO), 2016, : 335 - 338
  • [30] Comparison of Evaluation Metrics in Classification Applications with Imbalanced Datasets
    Fatourechi, Mehrdad
    Ward, Rabab K.
    Mason, Steven G.
    Huggins, Jane
    Schloegl, Alois
    Birch, Gary E.
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 777 - +