Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks

被引:2
|
作者
Tang, Huidong [1 ]
Kamei, Sayaka [1 ]
Morimoto, Yasuhiko [1 ]
机构
[1] Hiroshima Univ, Grad Sch Adv Sci & Engn, Kagamiyama 1-7-1, Higashihiroshima 7398521, Japan
关键词
artificial intelligence; natural language processing; text classification; data augmentation; robustness improvement;
D O I
10.3390/a16010059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification is widely studied in natural language processing (NLP). Deep learning models, including large pre-trained models like BERT and DistilBERT, have achieved impressive results in text classification tasks. However, these models' robustness against adversarial attacks remains an area of concern. To address this concern, we propose three data augmentation methods to improve the robustness of such pre-trained models. We evaluated our methods on four text classification datasets by fine-tuning DistilBERT on the augmented datasets and exposing the resulting models to adversarial attacks to evaluate their robustness. In addition to enhancing the robustness, our proposed methods can improve the accuracy and F1-score on three datasets. We also conducted comparison experiments with two existing data augmentation methods. We found that one of our proposed methods demonstrates a similar improvement in terms of performance, but all demonstrate a superior robustness improvement.
引用
下载
收藏
页数:21
相关论文
共 50 条
  • [41] MIXCODE: Enhancing Code Classification by Mixup-Based Data Augmentation
    Dong, Zeming
    Hu, Qiang
    Guo, Yuejun
    Cordy, Maxime
    Papadakis, Mike
    Zhang, Zhenya
    Le Traon, Yves
    Zhao, Jianjun
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 379 - 390
  • [42] Data Scarcity: Methods to Improve the Quality of Text Classification
    Glaser, Ingo
    Sadegharmaki, Shabnam
    Komboz, Basil
    Matthes, Florian
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 556 - 564
  • [43] Effective Data Augmentation Methods for Neural Text-to-Speech Systems
    Oh, Suhyeon
    Kwon, Ohsung
    Hwang, Min-Jae
    Kim, Jae-Min
    Song, Eunwoo
    2022 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2022,
  • [44] Classical Out-of-Distribution Detection Methods Benchmark in Text Classification Tasks
    Baran, Mateusz
    Baran, Joanna
    Wojcik, Mateusz
    Zieba, Maciej
    Gonczarek, Adam
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-SRW 2023, VOL 4, 2023, : 119 - 129
  • [45] A Comparison of Classification Methods Applied to Legal Text Data
    Araujo, Diogenes Carlos
    Lima, Alexandre
    Lima, Joao Pedro
    Costa, Jose Alfredo
    PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021), 2021, 12981 : 68 - 80
  • [46] Effect of Data Augmentation Methods on Face Image Classification Results
    Hrga, Ingrid
    Ivasic-Kos, Marina
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 660 - 667
  • [47] Rethinking data augmentation for adversarial robustness
    Eghbal-zadeh, Hamid
    Zellinger, Werner
    Pintor, Maura
    Grosse, Kathrin
    Koutini, Khaled
    Moser, Bernhard A.
    Biggio, Battista
    Widmer, Gerhard
    INFORMATION SCIENCES, 2024, 654
  • [48] Data Augmentation Can Improve Robustness
    Rebuffi, Sylvestre-Alvise
    Gowal, Sven
    Calian, Dan
    Stimberg, Florian
    Wiles, Olivia
    Mann, Timothy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [49] Enhancing Text Classification with the Universum
    Liu, Chien-Liang
    Lee, Ching-Hsien
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 1147 - 1153
  • [50] Quantum Text Encoding for Classification Tasks
    Alexander, Aaranya
    Widdows, Dominic
    2022 IEEE/ACM 7TH SYMPOSIUM ON EDGE COMPUTING (SEC 2022), 2022, : 355 - 361