Data Augmentation by Guided Deep Interpolation

被引:13
|
作者
Szlobodnyik, Gergely [1 ,2 ]
Farkas, Lorant [1 ]
机构
[1] Nokia, Bell Labs, Bokay St 36-42, Budapest, Hungary
[2] Pazmany Peter Catholic Univ, Dept Informat Technol & Bion, Budapest, Hungary
关键词
Autoencoders; Data augmentation; Imbalanced data; Self-expressiveness; Interpolation; IMAGE; REPRESENTATIONS; NETWORKS;
D O I
10.1016/j.asoc.2021.107680
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art machine learning algorithms require large amount of high quality data. In practice, however, the sample size is commonly low and data is imbalanced along different class labels. Low sample size and imbalanced class distribution can significantly deteriorate the predictive performance of machine learning models. In order to overcome data quality issues, we propose a novel data augmentation method, Guided Deep Interpolation (GDI). It is based on a convolutional auto-encoder network, which is equipped with an auxiliary linear self-expressive layer. The network is trained by minimizing a composite objective function so that to extract the underlying clustered structure of semantic similarities of data points while high reconstruction quality is also preserved. The trained network is used to define a sampling strategy and a synthetic data generation procedure. Making use of the weights of the self-expressive layer, we introduce a measure of semantic variability to quantify how similar a data point to other data points on average. Based on the proposed measure of semantic variability, a joint distribution is defined. Using the distribution we can draw pairs of similar data points so that one point is semantically underrepresented (isolated) while its pair possesses relatively high semantic variability. A sampled pair is interpolated in the deep feature space of the network so that to increase semantic variability while preserve class label of the semantically underrepresented data point. The trained decoder is used to determine pixel space representations of latent space interpolations. The resulting data augmentation procedure generates synthetic samples by increasing the semantic variability of semantically underrepresented instances in a class label preserving way. Our experimental results show that the proposed method outperforms traditional and generative model-based data augmentation methods on low sample size and imbalanced data sets. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Domain-guided data augmentation for deep learning on medical imaging
    Athalye, Chinmayee
    Arnaout, Rima
    [J]. PLOS ONE, 2023, 18 (03):
  • [2] Delaunay Triangulation Data Augmentation guided by Visual Analytics for Deep Learning
    Peixinho, Alan Z.
    Benato, Barbara C.
    Nonato, Luis G.
    Falcao, Alexandre X.
    [J]. PROCEEDINGS 2018 31ST SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 2018, : 384 - 391
  • [3] Deep adversarial data augmentation with attribute guided for person re-identification
    Wu, Qiong
    Dai, Pingyang
    Chen, Peixian
    Huang, Yuyu
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2021, 15 (04) : 655 - 662
  • [4] Deep adversarial data augmentation with attribute guided for person re-identification
    Qiong Wu
    Pingyang Dai
    Peixian Chen
    Yuyu Huang
    [J]. Signal, Image and Video Processing, 2021, 15 : 655 - 662
  • [5] DEEP ACTIVE LEARNING BASED ON SALIENCY-GUIDED DATA AUGMENTATION FOR IMAGE CLASSIFICATION
    Liu, Ying
    Pang, Yuliang
    Zhang, Weidong
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 815 - 819
  • [6] Counterexample-Guided Data Augmentation
    Dreossi, Tommaso
    Ghosh, Shromona
    Yue, Xiangyu
    Keutzer, Kurt
    Sangiovanni-Vincentelli, Alberto
    Seshia, Sanjit A.
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2071 - 2078
  • [7] Data Augmentation for Deep Receivers
    Raviv, Tomer
    Shlezinger, Nir
    [J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2023, 22 (11) : 8259 - 8274
  • [8] Deep attention SMOTE: Data augmentation with a learnable interpolation factor for imbalanced anomaly detection of gas turbines
    Liu, Dan
    Zhong, Shisheng
    Lin, Lin
    Zhao, Minghang
    Fu, Xuyun
    Liu, Xueyun
    [J]. COMPUTERS IN INDUSTRY, 2023, 151
  • [9] Probabilistic Interpolation with Mixup Data Augmentation for Text Classification
    Xu, Rongkang
    Zhang, Yongcheng
    Ren, Kai
    Huang, Yu
    Wei, Xiaomei
    [J]. ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024, 2024, 14878 : 410 - 421
  • [10] Text Data Augmentation for Deep Learning
    Shorten, Connor
    Khoshgoftaar, Taghi M.
    Furht, Borko
    [J]. JOURNAL OF BIG DATA, 2021, 8 (01)