Text Data Augmentation for Deep Learning

被引：186

作者：

Shorten, Connor ^{[1
]}

Khoshgoftaar, Taghi M. ^{[1
]}

Furht, Borko ^{[1
]}

机构：

[1] Florida Atlantic Univ, 777 Glades Rd, Boca Raton, FL 33431 USA

来源：

JOURNAL OF BIG DATA | 2021年 / 8卷 / 01期

关键词：

Data Augmentation; Natural Language Processing; Overfitting; Big Data; NLP; Text Data;

D O I：

10.1186/s40537-021-00492-0

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Natural Language Processing (NLP) is one of the most captivating applications of Deep Learning. In this survey, we consider how the Data Augmentation training strategy can aid in its development. We begin with the major motifs of Data Augmentation summarized into strengthening local decision boundaries, brute force training, causality and counterfactual examples, and the distinction between meaning and form. We follow these motifs with a concrete list of augmentation frameworks that have been developed for text data. Deep Learning generally struggles with the measurement of generalization and characterization of overfitting. We highlight studies that cover how augmentations can construct test sets for generalization. NLP is at an early stage in applying Data Augmentation compared to Computer Vision. We highlight the key differences and promising ideas that have yet to be tested in NLP. For the sake of practical implementation, we describe tools that facilitate Data Augmentation such as the use of consistency regularization, controllers, and offline and online augmentation pipelines, to preview a few. Finally, we discuss interesting topics around Data Augmentation in NLP such as task-specific augmentations, the use of prior knowledge in self-supervised learning versus Data Augmentation, intersections with transfer and multi-task learning, and ideas for AI-GAs (AI-Generating Algorithms). We hope this paper inspires further research interest in Text Data Augmentation.

引用

页数：34

共 50 条

[1] Text Data Augmentation for Deep Learning
Connor Shorten
Taghi M. Khoshgoftaar
Borko Furht
[J]. Journal of Big Data, 8
[2] Data Augmentation With Semantic Enrichment for Deep Learning Invoice Text Classification
Chi, Wei Wen
Tang, Tiong Yew
Salleh, Narishah Mohamed
Mukred, Muaadh
Alsalman, Hussain
Zohaib, Muhammad
[J]. IEEE ACCESS, 2024, 12 : 57326 - 57344
[3] Deep ensemble transfer learning framework for COVID-19 Arabic text identification via deep active learning and text data augmentation
Muaad, Abdullah Y.
Davanagere, Hanumanthappa Jayappa
Hussain, Jamil
Al-antari, Mugahed A.
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (33) : 79337 - 79375
[4] Data Augmentation for Bayesian Deep Learning
Wang, Yuexi
Polson, Nicholas
Sokolov, Vadim O.
[J]. BAYESIAN ANALYSIS, 2023, 18 (04): : 1041 - 1069
[5] Supervised text data augmentation method for deep neural networks
Seol, Jaehwan
Jung, Jieun
Choi, Yeonseok
Choi, Yong-Seok
[J]. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS, 2023, 30 (03) : 343 - 354
[6] Augmentation and Evaluation of Training Data for Deep Learning
Ding, Junhua
Li, XinChuan
Gudivada, Venkat N.
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 2603 - 2611
[7] Improving Deep Learning with Generic Data Augmentation
Taylor, Luke
Nitschke, Geoff
[J]. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 1542 - 1547
[8] Data Augmentation for Deep Learning of Judgment Documents
Yan, Ge
Li, Yu
Zhang, Shu
Chen, Zhenyu
[J]. INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: BIG DATA AND MACHINE LEARNING, PT II, 2019, 11936 : 232 - 242
[9] A survey on Image Data Augmentation for Deep Learning
Shorten, Connor
Khoshgoftaar, Taghi M.
[J]. JOURNAL OF BIG DATA, 2019, 6 (01)
[10] A survey on Image Data Augmentation for Deep Learning
Connor Shorten
Taghi M. Khoshgoftaar
[J]. Journal of Big Data, 6

← 1 2 3 4 5 →