Semi-Supervised Learning with Data Augmentation for Tabular Data

被引:3
|
作者
Fang, Junpeng [1 ]
Tang, Caizhi [1 ]
Cui, Qing [1 ]
Zhu, Feng [1 ]
Li, Longfei [1 ]
Zhou, Jun [1 ]
Zhu, Wei [1 ]
机构
[1] Ant Grp, Hangzhou, Peoples R China
关键词
semi-supervised learning; tabular data; data augmentation;
D O I
10.1145/3511808.3557699
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data augmentation-based semi-supervised learning (SSL) methods have made great progress in computer vision and natural language processing areas. One of the most important factors is that the semantic structure invariance of these data allows the augmentation procedure (e.g., rotating images or masking words) to thoroughly utilize the enormous amount of unlabeled data. However, the tabular data does not possess an obvious invariant structure, and therefore similar data augmentation methods do not apply to it. To fill this gap, we present a simple yet efficient data augmentation method particular designed for tabular data and apply it to the SSL algorithm: Sdat ( Semi-supervised learning with Data Augmentation for Tabular data). We adopt a multi-task learning framework that consists of two components: the data augmentation procedure and the consistency training procedure. The data augmentation procedure which perturbs in latent space employs a variational auto-encoder (Vae) to generate the reconstructed samples as augmented samples. The consistency training procedure constrains the predictions to be invariant between the augmented samples and the corresponding original samples. By sharing a representation network (encoder), we jointly train the two components to improve effectiveness and efficiency. Extensive experimental studies validate the effectiveness of the proposed method on the tabular datasets.
引用
收藏
页码:3928 / 3932
页数:5
相关论文
共 50 条
  • [1] SAWTab: Smoothed Adaptive Weighting for Tabular Data in Semi-supervised Learning
    Gharasuie, Morteza Mohammady
    Wang, Fengjiao
    Sharif, Omar
    Mukkamala, Ravi
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT III, PAKDD 2024, 2024, 14647 : 316 - 328
  • [2] ReCom: A deep reinforcement learning approach for semi-supervised tabular data labeling
    Zaks, Guy
    Katz, Gilad
    [J]. INFORMATION SCIENCES, 2022, 589 : 321 - 340
  • [3] Robust Semi-Supervised Learning With Multi-Consistency and Data Augmentation
    Guo, Jing-Ming
    Sun, Chi-Chia
    Chan, Kuan-Yu
    Liu, Chun-Yu
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 414 - 424
  • [4] ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning
    Olsson, Viktor
    Tranheden, Wilhelm
    Pinto, Juliano
    Svensson, Lennart
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1368 - 1377
  • [5] A comparison of graph-based semi-supervised learning for data augmentation
    de Oliveira, Willian Dihanster G.
    Penatti, Otavio A. B.
    Berton, Lilian
    [J]. 2020 33RD SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2020), 2020, : 264 - 271
  • [6] Semi-Supervised Learning with Data Augmentation for End-to-End ASR
    Weninger, Felix
    Mana, Franco
    Gemello, Roberto
    Andres-Ferrer, Jesus
    Zhan, Puming
    [J]. INTERSPEECH 2020, 2020, : 2802 - 2806
  • [7] MCoM: A Semi-Supervised Method for Imbalanced Tabular Security Data
    Li, Xiaodi
    Khan, Latifur
    Zamani, Mahmoud
    Wickramasuriya, Shamila
    Hamlen, Kevin W.
    Thuraisingham, Bhavani
    [J]. DATA AND APPLICATIONS SECURITY AND PRIVACY XXXVI, DBSEC 2022, 2022, 13383 : 48 - 67
  • [8] Data driven semi-supervised learning
    Balcan, Maria-Florina
    Sharma, Dravyansh
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [9] Semi-supervised and Task-Driven Data Augmentation
    Chaitanya, Krishna
    Karani, Neerav
    Baumgartner, Christian F.
    Becker, Anton
    Donati, Olivio
    Konukoglu, Ender
    [J]. INFORMATION PROCESSING IN MEDICAL IMAGING, IPMI 2019, 2019, 11492 : 29 - 41
  • [10] NodeAug: Semi-Supervised Node Classification with Data Augmentation
    Wang, Yiwei
    Wang, Wei
    Liang, Yuxuan
    Cai, Yujun
    Liu, Juncheng
    Hooi, Bryan
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 207 - 217