Data Augmentation by Guided Deep Interpolation

被引:13
|
作者
Szlobodnyik, Gergely [1 ,2 ]
Farkas, Lorant [1 ]
机构
[1] Nokia, Bell Labs, Bokay St 36-42, Budapest, Hungary
[2] Pazmany Peter Catholic Univ, Dept Informat Technol & Bion, Budapest, Hungary
关键词
Autoencoders; Data augmentation; Imbalanced data; Self-expressiveness; Interpolation; IMAGE; REPRESENTATIONS; NETWORKS;
D O I
10.1016/j.asoc.2021.107680
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art machine learning algorithms require large amount of high quality data. In practice, however, the sample size is commonly low and data is imbalanced along different class labels. Low sample size and imbalanced class distribution can significantly deteriorate the predictive performance of machine learning models. In order to overcome data quality issues, we propose a novel data augmentation method, Guided Deep Interpolation (GDI). It is based on a convolutional auto-encoder network, which is equipped with an auxiliary linear self-expressive layer. The network is trained by minimizing a composite objective function so that to extract the underlying clustered structure of semantic similarities of data points while high reconstruction quality is also preserved. The trained network is used to define a sampling strategy and a synthetic data generation procedure. Making use of the weights of the self-expressive layer, we introduce a measure of semantic variability to quantify how similar a data point to other data points on average. Based on the proposed measure of semantic variability, a joint distribution is defined. Using the distribution we can draw pairs of similar data points so that one point is semantically underrepresented (isolated) while its pair possesses relatively high semantic variability. A sampled pair is interpolated in the deep feature space of the network so that to increase semantic variability while preserve class label of the semantically underrepresented data point. The trained decoder is used to determine pixel space representations of latent space interpolations. The resulting data augmentation procedure generates synthetic samples by increasing the semantic variability of semantically underrepresented instances in a class label preserving way. Our experimental results show that the proposed method outperforms traditional and generative model-based data augmentation methods on low sample size and imbalanced data sets. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Deep Adversarial Data Augmentation for Extremely Low Data Regimes
    Zhang, Xiaofeng
    Wang, Zhangyang
    Liu, Dong
    Lin, Qifeng
    Ling, Qing
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (01) : 15 - 28
  • [32] DEEP PANCHROMATIC IMAGE GUIDED RESIDUAL INTERPOLATION FOR MULTISPECTRAL IMAGE DEMOSAICKING
    Pan, Zhihong
    Li, Baopu
    Bao, Yingze
    Cheng, Hsuchun
    [J]. 2019 10TH WORKSHOP ON HYPERSPECTRAL IMAGING AND SIGNAL PROCESSING - EVOLUTION IN REMOTE SENSING (WHISPERS), 2019,
  • [33] Discrepancy-Guided Domain-Adaptive Data Augmentation
    Gao, Jian
    Hua, Yang
    Hu, Guosheng
    Wang, Chi
    Robertson, Neil M.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 5064 - 5075
  • [34] Smooth-Guided Implicit Data Augmentation for Domain Generalization
    Wang, Mengzhu
    Liu, Junze
    Luo, Ge
    Wang, Shanshan
    Wang, Wei
    Lan, Long
    Wang, Ye
    Nie, Feiping
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [35] Influence-guided Data Augmentation for Neural Tensor Completion
    Oh, Sejoon
    Kim, Sungchul
    Rossi, Ryan A.
    Kumar, Srijan
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 1386 - 1395
  • [36] A deep learning method for extensible microstructural quantification of DP steel enhanced by physical metallurgy-guided data augmentation
    Shen, Chunguang
    Wei, Xiaolu
    Wang, Chenchong
    Xu, Wei
    [J]. MATERIALS CHARACTERIZATION, 2021, 180
  • [37] ENGAGE: Explanation Guided Data Augmentation for Graph Representation Learning
    Shi, Yucheng
    Zhou, Kaixiong
    Liu, Ninghao
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT III, 2023, 14171 : 104 - 121
  • [38] Wood-species identification based on terahertz spectral data augmentation and pseudo-label guided deep clustering
    Wang, Yuan
    Wang, Zhi-Gang
    He, Yi-Hao
    Avramidis, Stavros
    [J]. WOOD MATERIAL SCIENCE & ENGINEERING, 2024, 19 (05) : 1004 - 1014
  • [39] Data augmentation guided knowledge distillation for environmental sound classification
    Tripathi, Achyut Mani
    Paul, Konark
    [J]. NEUROCOMPUTING, 2022, 489 : 59 - 77
  • [40] Metapath-Guided Data-Augmentation For Knowledge Graphs
    Manchanda, Saurav
    [J]. PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 4175 - 4179