XyGen: Synthetic data generator for feature selection

被引:2
|
作者
Kamalov, Firuz [1 ]
Elnaffar, Said [1 ]
Sulieman, Hana [2 ]
Cherukuri, Aswani Kumar [3 ]
机构
[1] Canadian Univ Dubai, Dubai, U Arab Emirates
[2] Amer Univ Sharjah, Sharjah, U Arab Emirates
[3] Vellore Inst Technol, Vellore, India
关键词
Feature selection; Synthetic data; Machine learning; Data mining; MUTUAL INFORMATION; ALGORITHMS;
D O I
10.1016/j.simpa.2023.100485
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Given the large number of feature selection algorithms, it has become imperative to have a uniform procedure for evaluating the performance of the algorithms. We propose a library of synthetic datasets designed specifically to test the effectiveness of feature selection algorithms. The datasets are inspired by applications in the field of electronics and have a range of characteristics to provide a variety of test scenarios. The software comes in the form of a Python library with standard interface for loading and generating datasets. Each dataset is implemented as a function that allows control of various parameters of the data.
引用
收藏
页数:3
相关论文
共 50 条
  • [1] Synthetic Data for Feature Selection
    Kamalov, Firuz
    Sulieman, Hana
    Cherukuri, Aswani Kumar
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2023, PT II, 2023, 14126 : 353 - 365
  • [2] A review of feature selection methods on synthetic data
    Bolon-Canedo, Veronica
    Sanchez-Marono, Noelia
    Alonso-Betanzos, Amparo
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 34 (03) : 483 - 519
  • [3] A review of feature selection methods on synthetic data
    Verónica Bolón-Canedo
    Noelia Sánchez-Maroño
    Amparo Alonso-Betanzos
    [J]. Knowledge and Information Systems, 2013, 34 : 483 - 519
  • [4] A Prototype of Synthetic Data Generator
    Garcia, D.
    Milian, M.
    [J]. 2011 6TH COLOMBIAN COMPUTING CONGRESS (CCC), 2011,
  • [5] Desiderata for a Synthetic Clinical Data Generator
    Wiedekopf, Joshua
    Ulrich, Hannes
    Essenwanger, Andrea
    Kiel, Alexander
    Kock-Schoppenhauer, Ann-Kristin
    Ingenerf, Josef
    [J]. PUBLIC HEALTH AND INFORMATICS, PROCEEDINGS OF MIE 2021, 2021, 281 : 68 - 72
  • [6] Feature selection for the prosody modelling of synthetic speech
    Tucková, J
    Sebesta, V
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), VOLS. 1- 3, 2004, : 1270 - 1275
  • [7] Feature Selection with Data Field
    Yuan Hanning
    Wang Shuliang
    Li Ying
    Fan Jinghua
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2014, 23 (04) : 661 - 665
  • [8] Feature Selection for Unlabeled Data
    Chen, Chien-Hsing
    [J]. ADVANCES IN SWARM INTELLIGENCE, PT II, 2011, 6729 : 269 - 274
  • [9] Feature Selection: A Data Perspective
    Li, Jundong
    Cheng, Kewei
    Wang, Suhang
    Morstatter, Fred
    Trevino, Robert P.
    Tang, Jiliang
    Liu, Huan
    [J]. ACM COMPUTING SURVEYS, 2018, 50 (06)
  • [10] Feature selection for unlabeled data
    Dy, JG
    [J]. IEEE INTELLIGENT SYSTEMS, 2005, 20 (06) : 66 - 68