Detecting Errors with Zero-Shot Learning

被引:2
|
作者
Wu, Xiaoyu [1 ,2 ]
Wang, Ning [1 ,2 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing 100044, Peoples R China
[2] Beijing Key Lab Traff Data Anal & Min, Beijing 100044, Peoples R China
基金
国家重点研发计划;
关键词
error detection; zero-shot learning; self-attention mechanism; KNOWLEDGE-BASE;
D O I
10.3390/e24070936
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Error detection is a critical step in data cleaning. Most traditional error detection methods are based on rules and external information with high cost, especially when dealing with large-scaled data. Recently, with the advances of deep learning, some researchers focus their attention on learning the semantic distribution of data for error detection; however, the low error rate in real datasets makes it hard to collect negative samples for training supervised deep learning models. Most of the existing deep-learning-based error detection algorithms solve the class imbalance problem by data augmentation. Due to the inadequate sampling of negative samples, the features learned by those methods may be biased. In this paper, we propose an AEGAN (Auto-Encoder Generative Adversarial Network)-based deep learning model named SAT-GAN (Self-Attention Generative Adversarial Network) to detect errors in relational datasets. Combining the self-attention mechanism with the pre-trained language model, our model can capture semantic features of the dataset, specifically the functional dependency between attributes, so that no rules or constraints are needed for SAT-GAN to identify inconsistent data. For the lack of negative samples, we propose to train our model via zero-shot learning. As a clean-data tailored model, SAT-GAN tries to recognize error data as outliers by learning the latent features of clean data. In our evaluation, SAT-GAN achieves an average F-1-score of 0.95 on five datasets, which yields at least 46.2% F-1-score improvement over rule-based methods and outperforms state-of-the-art deep learning approaches in the absence of rules and negative samples.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Ordinal Zero-Shot Learning
    Huo, Zengwei
    Geng, Xin
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1916 - 1922
  • [2] Zero-Shot Kernel Learning
    Zhang, Hongguang
    Koniusz, Piotr
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7670 - 7679
  • [3] Zero-shot causal learning
    Nilforoshan, Hamed
    Moor, Michael
    Roohani, Yusuf
    Chen, Yining
    Surina, Anja
    Yasunaga, Michihiro
    Oblak, Sara
    Leskovec, Jure
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Zero-shot Metric Learning
    Xu, Xinyi
    Cao, Huanhuan
    Yang, Yanhua
    Yang, Erkun
    Deng, Cheng
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3996 - 4002
  • [5] Active Zero-Shot Learning
    Xie, Sihong
    Wang, Shaoxiong
    Yu, Philip S.
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 1889 - 1892
  • [6] Spherical Zero-Shot Learning
    Shen, Jiayi
    Xiao, Zehao
    Zhen, Xiantong
    Zhang, Lei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 634 - 645
  • [7] Rebalanced Zero-Shot Learning
    Ye, Zihan
    Yang, Guanyu
    Jin, Xiaobo
    Liu, Youfa
    Huang, Kaizhu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4185 - 4198
  • [8] Incremental Zero-Shot Learning
    Wei, Kun
    Deng, Cheng
    Yang, Xu
    Tao, Dacheng
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 13788 - 13799
  • [9] Lifelong Zero-Shot Learning
    Wei, Kun
    Deng, Cheng
    Yang, Xu
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 551 - 557
  • [10] A Unified Approach for Conventional Zero-Shot, Generalized Zero-Shot, and Few-Shot Learning
    Rahman, Shafin
    Khan, Salman
    Porikli, Fatih
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (11) : 5652 - 5667