Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks

被引:2
|
作者
Tang, Huidong [1 ]
Kamei, Sayaka [1 ]
Morimoto, Yasuhiko [1 ]
机构
[1] Hiroshima Univ, Grad Sch Adv Sci & Engn, Kagamiyama 1-7-1, Higashihiroshima 7398521, Japan
关键词
artificial intelligence; natural language processing; text classification; data augmentation; robustness improvement;
D O I
10.3390/a16010059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification is widely studied in natural language processing (NLP). Deep learning models, including large pre-trained models like BERT and DistilBERT, have achieved impressive results in text classification tasks. However, these models' robustness against adversarial attacks remains an area of concern. To address this concern, we propose three data augmentation methods to improve the robustness of such pre-trained models. We evaluated our methods on four text classification datasets by fine-tuning DistilBERT on the augmented datasets and exposing the resulting models to adversarial attacks to evaluate their robustness. In addition to enhancing the robustness, our proposed methods can improve the accuracy and F1-score on three datasets. We also conducted comparison experiments with two existing data augmentation methods. We found that one of our proposed methods demonstrates a similar improvement in terms of performance, but all demonstrate a superior robustness improvement.
引用
下载
收藏
页数:21
相关论文
共 50 条
  • [21] LiDA: Language-Independent Data Augmentation for Text Classification
    Sujana, Yudianto
    Kao, Hung-Yu
    IEEE ACCESS, 2023, 11 : 10894 - 10901
  • [22] PDA: Data Augmentation with Preposition Words on Chinese text classification
    Yang, Leixin
    Xiong, Haoyu
    Xiang, Yu
    2024 2ND ASIA CONFERENCE ON COMPUTER VISION, IMAGE PROCESSING AND PATTERN RECOGNITION, CVIPPR 2024, 2024,
  • [23] Data augmentation and adversary attack on limit resources text classification
    Fernando Sánchez-Vega
    A. Pastor López-Monroy
    Antonio Balderas-Paredes
    Luis Pellegrin
    Alejandro Rosales-Pérez
    Multimedia Tools and Applications, 2025, 84 (3) : 1317 - 1344
  • [24] TextANN: An Improved Text Classification Model Based on Data Augmentation
    Li, Hong
    Yang, Xiaosheng
    Yang, Guoqing
    Ouyang, Xiaogang
    Chen, Yu
    Wang, Xueqing
    2018 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, BIG DATA AND BLOCKCHAIN (ICCBB 2018), 2018, : 160 - 163
  • [25] A Submodular Optimization Framework for Imbalanced Text Classification With Data Augmentation
    Alemayehu, Eyor
    Fang, Yi
    IEEE ACCESS, 2023, 11 : 41680 - 41696
  • [26] Does Robustness Improve Fairness? Approaching Fairness withWord Substitution Robustness Methods for Text Classification
    Pruksachatkun, Yada
    Krishna, Satyapriya
    Dhamala, Jwala
    Gupta, Rahul
    Chang, Kai Wei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3320 - 3331
  • [27] Enhancing Endoscopic Image Classification with Symptom Localization and Data Augmentation
    Trung-Hieu Hoang
    Hai-Dang Nguyen
    Viet-Anh Nguyen
    Thanh-An Nguyen
    Vinh-Tiep Nguyen
    Minh-Triet Tran
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2578 - 2582
  • [28] Contrastive learning with text augmentation for text classification
    Jia, Ouyang
    Huang, Huimin
    Ren, Jiaxin
    Xie, Luodi
    Xiao, Yinyin
    APPLIED INTELLIGENCE, 2023, 53 (16) : 19522 - 19531
  • [29] Contrastive learning with text augmentation for text classification
    Ouyang Jia
    Huimin Huang
    Jiaxin Ren
    Luodi Xie
    Yinyin Xiao
    Applied Intelligence, 2023, 53 : 19522 - 19531
  • [30] Exploring Image Classification Robustness and Interpretability with Right for the Right Reasons Data Augmentation
    Oliveira Santos, Flavio Arthur
    Zanchettin, Cleber
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 4149 - 4158