EasyAug: An Automatic Textual Data Augmentation Platform for Classification Tasks

被引:25
|
作者
Qiu, Siyuan [1 ]
Xu, Binxia [1 ]
Zhang, Jie [1 ]
Wang, Yafang [1 ]
Shen, Xiaoyu [2 ]
de Melo, Gerard [3 ]
Long, Chong [1 ]
Li, Xiaolong [1 ]
机构
[1] Ant Financial Serv Grp, Hangzhou, Peoples R China
[2] Max Planck Inst Informat, Saarbrucken, Germany
[3] Rutgers State Univ, New Brunswick, NJ USA
关键词
imbalanced data; data augmentation; text generation; model fusion; text classification;
D O I
10.1145/3366424.3383552
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced data is a perennial problem that impedes the learning abilities of current machine learning-based classification models. One approach to address it is to leverage data augmentation to expand the training set. For image data, there are a number of suitable augmentation techniques that have proven effective in previous work. For textual data, however, due to the discrete units inherent in natural language, techniques that randomly perturb the signal may be ineffective. Additionally, due to the substantial discrepancy between different textual datasets (e.g., different domains), an augmentation approach that facilitates the classification on one dataset may be detrimental on another dataset. For practitioners, comparing different data augmentation techniques is non-trivial, as the corresponding methods might need to be incorporated into different system architectures, and the implementation of some approaches, such as generative models, is laborious. To address these challenges, we develop EasyAug, a data augmentation platform that provides several augmentation approaches. Users can conveniently compare the classification results and can easily choose the most suitable one for their own dataset. In addition, the system is extensible and can incorporate further augmentation approaches, such that with minimal effort a new method can comprehensively be compared with the baselines.
引用
收藏
页码:249 / 252
页数:4
相关论文
共 50 条
  • [1] EASY DATA AUGMENTATION METHOD FOR CLASSIFICATION TASKS
    Liu Guohang
    Zhang Shibin
    Tang Haozhe
    Yang Lu
    Lu Jiazhong
    Huang Yuanyuan
    [J]. 2020 17TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2020, : 166 - 169
  • [2] Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks
    Tang, Huidong
    Kamei, Sayaka
    Morimoto, Yasuhiko
    [J]. ALGORITHMS, 2023, 16 (01)
  • [3] Automatic modulation classification based on AlexNet with data augmentation
    Chengchang, Zhang
    Yu, Xu
    Jianpeng, Yang
    Xiaomeng, Li
    [J]. Journal of China Universities of Posts and Telecommunications, 2022, 29 (05): : 51 - 61
  • [4] Data Augmentation with Conditional GAN for Automatic Modulation Classification
    Patel, Mansi
    Wang, Xuyu
    Mao, Shiwen
    [J]. PROCEEDINGS OF THE 2ND ACM WORKSHOP ON WIRELESS SECURITY AND MACHINE LEARNING, WISEML 2020, 2020, : 31 - 36
  • [5] Automatic modulation classification based on Alex Net with data augmentation
    Zhang Chengchang
    Xu Yu
    Yang Jianpeng
    Li Xiaomeng
    [J]. The Journal of China Universities of Posts and Telecommunications, 2022, 29 (05) : 51 - 61
  • [6] Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks
    Wu, Xing
    Gao, Chaochen
    Lin, Meng
    Zang, Liangjun
    Hu, Songlin
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, 2022, : 871 - 875
  • [7] Data augmentation using virtual word insertion techniques in text classification tasks
    Long, Zhigao
    Li, Hong
    Shi, Jiawen
    Ma, Xin
    [J]. EXPERT SYSTEMS, 2024, 41 (04)
  • [8] EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
    Wei, Jason
    Zou, Kai
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6382 - 6388
  • [9] Iterative Translation-Based Data Augmentation Method for Text Classification Tasks
    Lee, Sangwon
    Liu, Ling
    Choi, Wonik
    [J]. IEEE ACCESS, 2021, 9 : 160437 - 160445
  • [10] Data Augmentation using Style Transfer in SAR Automatic Target Classification
    Zhu, Xu
    Mori, Hiroki
    [J]. ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN DEFENSE APPLICATIONS III, 2021, 11870