SAWTab: Smoothed Adaptive Weighting for Tabular Data in Semi-supervised Learning

被引:0
|
作者
Gharasuie, Morteza Mohammady [1 ]
Wang, Fengjiao [2 ]
Sharif, Omar [1 ]
Mukkamala, Ravi [1 ]
机构
[1] Old Dominion Univ, Norfolk, VA 23529 USA
[2] Univ Utah, Salt Lake City, UT 84112 USA
关键词
Semi-supervised learning; Feature representation; Pseudo-label; Tabular domain; adaptive weighting;
D O I
10.1007/978-981-97-2259-4_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-supervised and Semi-supervised learning (SSL) on tabular data is an understudied topic. Despite some attempts, there are two major challenges: 1. Imbalanced nature in the tabular dataset; 2. The one-hot encoding used in these methods becomes less efficient for high-cardinality categorical features. To cope with the challenges, we propose SAWTab which uses a target encoding method, Conditional Probability Representation (CPR), for efficient representation in the input space of categorical features. We improve this representation by incorporating the unlabeled samples through pseudo-labels. Furthermore, we propose a Smooth Adaptive Weighting mechanism in the target encoding to mitigate the issue of noisy and biased pseudo-labels. Experimental results on various datasets and comparisons with existing frameworks show that SAWTab yields best test accuracy on all datasets. We find that pseudo-labels can help improve the input space representation in the SSL setting, which enhances the generalization of the learning algorithm.
引用
收藏
页码:316 / 328
页数:13
相关论文
共 50 条
  • [1] Smoothed Adaptive Weighting for Imbalanced Semi-Supervised Learning: Improve Reliability Against Unknown Distribution Data
    Lai, Zhengfeng
    Wang, Chao
    Gunawan, Henrry
    Cheung, Sen-Ching
    Chuah, Chen-Nee
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [2] Semi-Supervised Learning with Data Augmentation for Tabular Data
    Fang, Junpeng
    Tang, Caizhi
    Cui, Qing
    Zhu, Feng
    Li, Longfei
    Zhou, Jun
    Zhu, Wei
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3928 - 3932
  • [3] Semi-Supervised Learning with Auto-Weighting Feature and Adaptive Graph
    Nie, Feiping
    Shi, Shaojun
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (06) : 1167 - 1178
  • [4] Constrained feature weighting for semi-supervised learning
    Chen, Xinyi
    Zhang, Li
    Zhao, Lei
    Zhang, Xiaofang
    [J]. APPLIED INTELLIGENCE, 2024, 54 (20) : 9987 - 10006
  • [5] ReCom: A deep reinforcement learning approach for semi-supervised tabular data labeling
    Zaks, Guy
    Katz, Gilad
    [J]. INFORMATION SCIENCES, 2022, 589 : 321 - 340
  • [6] Progressive Feature Upgrade in Semi-supervised Learning on Tabular Domain
    Gharasuie, Morteza Mohammady
    Wang, Fenjiao
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG), 2022, : 188 - 195
  • [7] MCoM: A Semi-Supervised Method for Imbalanced Tabular Security Data
    Li, Xiaodi
    Khan, Latifur
    Zamani, Mahmoud
    Wickramasuriya, Shamila
    Hamlen, Kevin W.
    Thuraisingham, Bhavani
    [J]. DATA AND APPLICATIONS SECURITY AND PRIVACY XXXVI, DBSEC 2022, 2022, 13383 : 48 - 67
  • [8] Data driven semi-supervised learning
    Balcan, Maria-Florina
    Sharma, Dravyansh
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [9] Adaptive semi-supervised learning on labeled and unlabeled data with different distributions
    Akinori Fujino
    Naonori Ueda
    Masaaki Nagata
    [J]. Knowledge and Information Systems, 2013, 37 : 129 - 154
  • [10] Adaptive semi-supervised learning on labeled and unlabeled data with different distributions
    Fujino, Akinori
    Ueda, Naonori
    Nagata, Masaaki
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 37 (01) : 129 - 154