Text Data Augmentation for Deep Learning

被引:186
|
作者
Shorten, Connor [1 ]
Khoshgoftaar, Taghi M. [1 ]
Furht, Borko [1 ]
机构
[1] Florida Atlantic Univ, 777 Glades Rd, Boca Raton, FL 33431 USA
关键词
Data Augmentation; Natural Language Processing; Overfitting; Big Data; NLP; Text Data;
D O I
10.1186/s40537-021-00492-0
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Natural Language Processing (NLP) is one of the most captivating applications of Deep Learning. In this survey, we consider how the Data Augmentation training strategy can aid in its development. We begin with the major motifs of Data Augmentation summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, and the distinction between meaning and form. We follow these motifs with a concrete list of augmentation frameworks that have been developed for text data. Deep Learning generally struggles with the measurement of generalization and characterization of overfitting. We highlight studies that cover how augmentations can construct test sets for generalization. NLP is at an early stage in applying Data Augmentation compared to Computer Vision. We highlight the key differences and promising ideas that have yet to be tested in NLP. For the sake of practical implementation, we describe tools that facilitate Data Augmentation such as the use of consistency regularization, controllers, and offline and online augmentation pipelines, to preview a few. Finally, we discuss interesting topics around Data Augmentation in NLP such as task-specific augmentations, the use of prior knowledge in self-supervised learning versus Data Augmentation, intersections with transfer and multi-task learning, and ideas for AI-GAs (AI-Generating Algorithms). We hope this paper inspires further research interest in Text Data Augmentation.
引用
收藏
页数:34
相关论文
共 50 条
  • [1] Text Data Augmentation for Deep Learning
    Connor Shorten
    Taghi M. Khoshgoftaar
    Borko Furht
    [J]. Journal of Big Data, 8
  • [2] Data Augmentation With Semantic Enrichment for Deep Learning Invoice Text Classification
    Chi, Wei Wen
    Tang, Tiong Yew
    Salleh, Narishah Mohamed
    Mukred, Muaadh
    Alsalman, Hussain
    Zohaib, Muhammad
    [J]. IEEE ACCESS, 2024, 12 : 57326 - 57344
  • [3] Deep ensemble transfer learning framework for COVID-19 Arabic text identification via deep active learning and text data augmentation
    Muaad, Abdullah Y.
    Davanagere, Hanumanthappa Jayappa
    Hussain, Jamil
    Al-antari, Mugahed A.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (33) : 79337 - 79375
  • [4] Data Augmentation for Bayesian Deep Learning
    Wang, Yuexi
    Polson, Nicholas
    Sokolov, Vadim O.
    [J]. BAYESIAN ANALYSIS, 2023, 18 (04): : 1041 - 1069
  • [5] Supervised text data augmentation method for deep neural networks
    Seol, Jaehwan
    Jung, Jieun
    Choi, Yeonseok
    Choi, Yong-Seok
    [J]. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2023, 30 (03) : 343 - 354
  • [6] Augmentation and Evaluation of Training Data for Deep Learning
    Ding, Junhua
    Li, XinChuan
    Gudivada, Venkat N.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 2603 - 2611
  • [7] Improving Deep Learning with Generic Data Augmentation
    Taylor, Luke
    Nitschke, Geoff
    [J]. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 1542 - 1547
  • [8] Data Augmentation for Deep Learning of Judgment Documents
    Yan, Ge
    Li, Yu
    Zhang, Shu
    Chen, Zhenyu
    [J]. INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: BIG DATA AND MACHINE LEARNING, PT II, 2019, 11936 : 232 - 242
  • [9] A survey on Image Data Augmentation for Deep Learning
    Shorten, Connor
    Khoshgoftaar, Taghi M.
    [J]. JOURNAL OF BIG DATA, 2019, 6 (01)
  • [10] A survey on Image Data Augmentation for Deep Learning
    Connor Shorten
    Taghi M. Khoshgoftaar
    [J]. Journal of Big Data, 6