Synthetic Data Generator for Classification Rules Learning

被引:0
|
作者
Liu, Runzong [1 ]
Fang, Bin [1 ]
Tang, Yuan Yan [2 ]
Chan, Patrick P. K. [3 ]
机构
[1] Chongqing Univ, Coll Comp Sci, Chongqing, Peoples R China
[2] Univ Macau, Fac Sci & Technol, Macau, Peoples R China
[3] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou, Guangdong, Peoples R China
关键词
Synthetic data; Automatic decision support; Data mining; Decision tree; DECISION TREE;
D O I
10.1109/CCBD.2016.78
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
A standard data set is useful to empirically evaluate classification rules learning algorithms. However, there is still no standard data set which is common enough for various situations. Data sets from the real world are limited to specific applications. The sizes of attributes, the rules and samples of the real data are fixed. A data generator is proposed here to produce synthetic data set which can be as big as the experiments demand. The size of attributes, rules, and samples of the synthetic data sets can be easily changed to meet the demands of evaluation on different learning algorithms. In the generator, related attributes are created at first. And then, rules are created based on the attributes. Samples are produced following the rules. Three decision tree algorithms are evaluated used synthetic data sets produced by the proposed data generator.
引用
收藏
页码:357 / 361
页数:5
相关论文
共 50 条
  • [41] Improving the performance of machine learning penicillin adverse drug reaction classification with synthetic data and transfer learning
    Stanekova, Viera
    Inglis, Joshua M.
    Lam, Lydia
    Lam, Antoinette
    Smith, William
    Shakib, Sepehr
    Bacchi, Stephen
    [J]. INTERNAL MEDICINE JOURNAL, 2024, 54 (07) : 1183 - 1189
  • [42] Learning Classification Rules Based on Concept Semilattice
    Qi, Chengming
    Cui, Shoumei
    Sun, Yunchuan
    [J]. 2009 ISECS INTERNATIONAL COLLOQUIUM ON COMPUTING, COMMUNICATION, CONTROL, AND MANAGEMENT, VOL III, 2009, : 221 - +
  • [43] Learning classification rules for multiple target attributes
    Zenko, Bernard
    Dzeroski, Saso
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2008, 5012 : 454 - 465
  • [44] LEARNING OF UNCERTAIN CLASSIFICATION RULES IN MEDICAL DIAGNOSIS
    BINAGHI, E
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1991, 548 : 115 - 119
  • [45] Learning of fuzzy classification rules by a genetic algorithm
    Ishibuchi, H
    Murata, T
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 1997, 80 (03): : 37 - 46
  • [46] Learning classification rules based on effect measure
    He, Tianzhong
    Zhou, Zhongmei
    Huang, Zaixiang
    [J]. International Journal of u- and e- Service, Science and Technology, 2013, 6 (04) : 209 - 217
  • [47] Synthetic turbulent inflow generator using machine learning
    Fukami, Kai
    Nabae, Yusuke
    Kawai, Ken
    Fukagata, Koji
    [J]. PHYSICAL REVIEW FLUIDS, 2019, 4 (06)
  • [48] SYNTHETIC CROWD AND PEDESTRIAN GENERATOR FOR DEEP LEARNING PROBLEMS
    Khadka, A.
    Remagnino, P.
    Argyriou, V
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4052 - 4056
  • [49] Interestingness Classification of Association Rules for Master Data
    Han, Wei
    Borges, Julio
    Neumayer, Peter
    Ding, Yong
    Riedel, Till
    Beigl, Michael
    [J]. ADVANCES IN DATA MINING: APPLICATIONS AND THEORETICAL ASPECTS, ICDM 2017, 2017, 10357 : 237 - 245
  • [50] Training and assessing classification rules with imbalanced data
    Menardi, Giovanna
    Torelli, Nicola
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 28 (01) : 92 - 122