Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks

被引:2
|
作者
Tang, Huidong [1 ]
Kamei, Sayaka [1 ]
Morimoto, Yasuhiko [1 ]
机构
[1] Hiroshima Univ, Grad Sch Adv Sci & Engn, Kagamiyama 1-7-1, Higashihiroshima 7398521, Japan
关键词
artificial intelligence; natural language processing; text classification; data augmentation; robustness improvement;
D O I
10.3390/a16010059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification is widely studied in natural language processing (NLP). Deep learning models, including large pre-trained models like BERT and DistilBERT, have achieved impressive results in text classification tasks. However, these models' robustness against adversarial attacks remains an area of concern. To address this concern, we propose three data augmentation methods to improve the robustness of such pre-trained models. We evaluated our methods on four text classification datasets by fine-tuning DistilBERT on the augmented datasets and exposing the resulting models to adversarial attacks to evaluate their robustness. In addition to enhancing the robustness, our proposed methods can improve the accuracy and F1-score on three datasets. We also conducted comparison experiments with two existing data augmentation methods. We found that one of our proposed methods demonstrates a similar improvement in terms of performance, but all demonstrate a superior robustness improvement.
引用
下载
收藏
页数:21
相关论文
共 50 条
  • [31] Data Augmentation With Semantic Enrichment for Deep Learning Invoice Text Classification
    Chi, Wei Wen
    Tang, Tiong Yew
    Salleh, Narishah Mohamed
    Mukred, Muaadh
    Alsalman, Hussain
    Zohaib, Muhammad
    IEEE ACCESS, 2024, 12 : 57326 - 57344
  • [32] Text Data Augmentation Techniques for Word Embeddings in Fake News Classification
    Kapusta, Jozef
    Drzik, David
    Steflovic, Kirsten
    Nagy, Kitti Szabo
    IEEE ACCESS, 2024, 12 : 31538 - 31550
  • [33] Nonlinear Mixup: Out-Of-Manifold Data Augmentation for Text Classification
    Guo, Hongyu
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 4044 - 4051
  • [34] Explainable Text Classification via Attentive and Targeted Mixing Data Augmentation
    Jiang, Songhao
    Chu, Yan
    Wang, Zhengkui
    Ma, Tianxing
    Wang, Hanlin
    Lu, Wenxuan
    Zang, Tianning
    Wang, Bo
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5085 - 5094
  • [35] Heavy-tailed Representations, Text Polarity Classification & Data Augmentation
    Jalalzai, Hamid
    Colombo, Pierre
    Clavel, Chloe
    Gaussier, Eric
    Varni, Giovanna
    Vignon, Emmanuel
    Sabourin, Anne
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [36] RobustMixGen: Data augmentation for enhancing robustness of visual–language models in the presence of distribution shift
    Kim, Sunwoo
    Im, Hun
    Lee, Woojun
    Lee, Seonggye
    Kang, Pilsung
    Neurocomputing, 2025, 619
  • [37] Enhancing ALPR: a two stage YOLO model with data augmentation for improved accuracy and robustness
    Swati Bansal
    Abhilasha Jain
    Manoj Sharma
    Gautam Kumar
    Shivam Ojha
    Hemant Walia
    Multimedia Tools and Applications, 2024, 83 (37) : 84933 - 84952
  • [38] Enhancing text categorization with semantic-enriched representation and training data augmentation
    Lu, Xinghua
    Zheng, Bin
    Velivelli, Atulya
    Zhai, ChengXiang
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2006, 13 (05) : 526 - 535
  • [39] MIXCODE: Enhancing Code Classification by Mixup-Based Data Augmentation
    Dong, Zeming
    Hu, Qiang
    Guo, Yuejun
    Cordy, Maxime
    Papadakis, Mike
    Zhang, Zhenya
    Le Traon, Yves
    Zhao, Jianjun
    arXiv, 2022,
  • [40] MixCode: Enhancing Code Classification by Mixup-Based Data Augmentation
    Dong, Zeming
    Hu, Qiang
    Guo, Yuejun
    Cordy, Maxime
    Papadakis, Mike
    Zhang, Zhenya
    Traon, Yves Le
    Zhao, Jianjun
    Proceedings - 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2023, 2023, : 379 - 390