Differentiable and Scalable Generative Adversarial Models for Data Imputation

被引:4
|
作者
Wu, Yangyang [1 ]
Wang, Jun [2 ]
Miao, Xiaoye [1 ]
Wang, Wenjia [2 ]
Yin, Jianwei [3 ]
机构
[1] Zhejiang Univ, Ctr Data Sci, Hangzhou 310058, Peoples R China
[2] Hong Kong Univ Sci & Technol, Kowloon, Hong Kong, Peoples R China
[3] Zhejiang Univ, Coll Comp Sci, Ctr Data Sci, Hangzhou 310058, Peoples R China
关键词
Data imputation; generative adversarial network; large-scale incomplete data; EFFICIENT;
D O I
10.1109/TKDE.2023.3293129
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data imputation has been extensively explored to solve the missing data problem. The dramatically increasing volume of incomplete data makes the imputation models computationally infeasible in many real-life applications. In this paper, we propose an effective scalable imputation system named SCIS to significantly speed up the training of the differentiable generative adversarial imputation models under accuracy-guarantees for large-scale incomplete data.SCIS consists of two modules, differentiable imputation modeling (DIM) and sample size estimation (SSE). DIM leverages a new masking Sinkhorn divergence function to make an arbitrary generative adversarial imputation model differentiable, while for such a differentiable imputation model, SSE can estimate an appropriate sample size to ensure the user-specified imputation accuracy of the final model. Moreover, SCIS can also accelerate the autoencoder based imputation models. Extensive experiments upon several real-life large-scale datasets demonstrate that, our proposed system can accelerate the generative adversarial model training by 6.23x. Using around 1.27% samples, SCIS yields competitive accuracy with the state-of-the-art imputation methods in much shorter computation time.
引用
收藏
页码:490 / 503
页数:14
相关论文
共 50 条
  • [21] Customized generative adversarial data imputation model for industrial soft sensing
    Yao Z.-J.
    Zhao C.-H.
    Li Y.-L.
    Fu C.
    Qiao H.-L.
    Kongzhi yu Juece/Control and Decision, 2021, 36 (12): : 2929 - 2936
  • [22] Identifiable Generative Models for Missing Not at Random Data Imputation
    Ma, Chao
    Zhang, Cheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [23] Categorical EHR Imputation with Generative Adversarial Nets
    Yang, Yinchong
    Wu, Zhiliang
    Tresp, Volker
    Fasching, Peter A.
    2019 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2019, : 27 - 36
  • [24] Improved generative adversarial network with deep metric learning for missing data imputation
    Al-taezi, Mohammed Ali
    Wang, Yu
    Zhu, Pengfei
    Hu, Qinghua
    Al-badwi, Abdulrahman
    NEUROCOMPUTING, 2024, 570
  • [25] Missing Data Imputation in Transformer District Based on Improved Generative Adversarial Network
    Liu K.
    Zhou F.
    Zhou H.
    Wang C.
    Dianwang Jishu/Power System Technology, 2022, 46 (08): : 3231 - 3239
  • [26] Generative Adversarial Networks Assist Missing Data Imputation: A Comprehensive Survey and Evaluation
    Shahbazian, Reza
    Greco, Sergio
    IEEE ACCESS, 2023, 11 : 88908 - 88928
  • [27] A data imputation method for multivariate time series based on generative adversarial network
    Guo, Zijian
    Wan, Yiming
    Ye, Hao
    NEUROCOMPUTING, 2019, 360 : 185 - 197
  • [28] STGAN: Spatio-Temporal Generative Adversarial Network for Traffic Data Imputation
    Yuan, Ye
    Zhang, Yong
    Wang, Boyue
    Peng, Yuan
    Hu, Yongli
    Yin, Baocai
    IEEE TRANSACTIONS ON BIG DATA, 2023, 9 (01) : 200 - 211
  • [29] Imputation of missing data with class imbalance using conditional generative adversarial networks
    Awan, Saqib Ejaz
    Bennamoun, Mohammed
    Sohel, Ferdous
    Sanfilippo, Frank
    Dwivedi, Girish
    NEUROCOMPUTING, 2021, 453 : 164 - 171
  • [30] Multi-task Generative Adversarial Network for Missing Mobility Data Imputation
    Shi, Meihui
    Shen, Derong
    Kou, Yue
    Nie, Tiezheng
    Yu, Ge
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 4480 - 4484