EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

被引:0
|
作者
Wei, Jason [1 ,2 ]
Zou, Kai [3 ]
机构
[1] Protago Labs Res, Tysons Corner, VA 22182 USA
[2] Dartmouth Coll, Dept Comp Sci, Hanover, NH 03755 USA
[3] Georgetown Univ, Dept Math & Stat, Washington, DC 20057 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present EDA: easy data augmentation techniques for boosting performance on text classification tasks. EDA consists of four simple but powerful operations: synonym replacement, random insertion, random swap, and random deletion. On five text classification tasks, we show that EDA improves performance for both convolutional and recurrent neural networks. EDA demonstrates particularly strong results for smaller datasets; on average, across five datasets, training with EDA while using only 50% of the available training set achieved the same accuracy as normal training with all available data. We also performed extensive ablation studies and suggest parameters for practical use.
引用
收藏
页码:6382 / 6388
页数:7
相关论文
共 50 条
  • [1] EASY DATA AUGMENTATION METHOD FOR CLASSIFICATION TASKS
    Liu Guohang
    Zhang Shibin
    Tang Haozhe
    Yang Lu
    Lu Jiazhong
    Huang Yuanyuan
    [J]. 2020 17TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2020, : 166 - 169
  • [2] Data augmentation using virtual word insertion techniques in text classification tasks
    Long, Zhigao
    Li, Hong
    Shi, Jiawen
    Ma, Xin
    [J]. EXPERT SYSTEMS, 2024, 41 (04)
  • [3] MDA: Multimodal Data Augmentation Framework for Boosting Performance on Sentiment/Emotion Classification Tasks
    Xu, Nan
    Mao, Wenji
    Wei, Penghui
    Zeng, Daniel
    [J]. IEEE INTELLIGENT SYSTEMS, 2021, 36 (06) : 3 - 11
  • [4] EPiDA: An Easy Plug-in Data Augmentation Framework for High Performance Text Classification
    Zhao, Minyi
    Zhang, Lu
    Xu, Yi
    Ding, Jiandong
    Guan, Jihong
    Zhou, Shuigeng
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 4742 - 4752
  • [5] Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks
    Tang, Huidong
    Kamei, Sayaka
    Morimoto, Yasuhiko
    [J]. ALGORITHMS, 2023, 16 (01)
  • [6] Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks
    Wu, Xing
    Gao, Chaochen
    Lin, Meng
    Zang, Liangjun
    Hu, Songlin
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, 2022, : 871 - 875
  • [7] Iterative Translation-Based Data Augmentation Method for Text Classification Tasks
    Lee, Sangwon
    Liu, Ling
    Choi, Wonik
    [J]. IEEE ACCESS, 2021, 9 : 160437 - 160445
  • [8] Simple Data Augmentation Tricks for Boosting Performance on Electricity Theft Detection Tasks
    Liao, Wenlong
    Yang, Zhe
    Bak-Jensen, Birgitte
    Pillai, Jayakrishnan Radhakrishna
    Von Krannichfeldt, Leandro
    Wang, Yusen
    Yang, Dechang
    [J]. IEEE TRANSACTIONS ON INDUSTRY APPLICATIONS, 2023, 59 (04) : 4846 - 4858
  • [9] Text Data Augmentation Techniques for Word Embeddings in Fake News Classification
    Kapusta, Jozef
    Drzik, David
    Steflovic, Kirsten
    Nagy, Kitti Szabo
    [J]. IEEE ACCESS, 2024, 12 : 31538 - 31550
  • [10] Data Augmentation with Transformers for Text Classification
    Medardo Tapia-Tellez, Jose
    Jair Escalante, Hugo
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2020, PT II, 2020, 12469 : 247 - 259