Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks

被引：2

作者：

Tang, Huidong ^{[1
]}

Kamei, Sayaka ^{[1
]}

Morimoto, Yasuhiko ^{[1
]}

机构：

[1] Hiroshima Univ, Grad Sch Adv Sci & Engn, Kagamiyama 1-7-1, Higashihiroshima 7398521, Japan

来源：

ALGORITHMS | 2023年 / 16卷 / 01期

关键词：

artificial intelligence; natural language processing; text classification; data augmentation; robustness improvement;

D O I：

10.3390/a16010059

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text classification is widely studied in natural language processing (NLP). Deep learning models, including large pre-trained models like BERT and DistilBERT, have achieved impressive results in text classification tasks. However, these models' robustness against adversarial attacks remains an area of concern. To address this concern, we propose three data augmentation methods to improve the robustness of such pre-trained models. We evaluated our methods on four text classification datasets by fine-tuning DistilBERT on the augmented datasets and exposing the resulting models to adversarial attacks to evaluate their robustness. In addition to enhancing the robustness, our proposed methods can improve the accuracy and F1-score on three datasets. We also conducted comparison experiments with two existing data augmentation methods. We found that one of our proposed methods demonstrates a similar improvement in terms of performance, but all demonstrate a superior robustness improvement.

引用

下载

页数：21

共 50 条

[41] MIXCODE: Enhancing Code Classification by Mixup-Based Data Augmentation
Dong, Zeming
Hu, Qiang
Guo, Yuejun
Cordy, Maxime
Papadakis, Mike
Zhang, Zhenya
Le Traon, Yves
Zhao, Jianjun
2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 379 - 390
[42] Data Scarcity: Methods to Improve the Quality of Text Classification
Glaser, Ingo
Sadegharmaki, Shabnam
Komboz, Basil
Matthes, Florian
PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 556 - 564
[43] Effective Data Augmentation Methods for Neural Text-to-Speech Systems
Oh, Suhyeon
Kwon, Ohsung
Hwang, Min-Jae
Kim, Jae-Min
Song, Eunwoo
2022 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2022,
[44] Classical Out-of-Distribution Detection Methods Benchmark in Text Classification Tasks
Baran, Mateusz
Baran, Joanna
Wojcik, Mateusz
Zieba, Maciej
Gonczarek, Adam
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-SRW 2023, VOL 4, 2023, : 119 - 129
[45] A Comparison of Classification Methods Applied to Legal Text Data
Araujo, Diogenes Carlos
Lima, Alexandre
Lima, Joao Pedro
Costa, Jose Alfredo
PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021), 2021, 12981 : 68 - 80
[46] Effect of Data Augmentation Methods on Face Image Classification Results
Hrga, Ingrid
Ivasic-Kos, Marina
PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 660 - 667
[47] Rethinking data augmentation for adversarial robustness
Eghbal-zadeh, Hamid
Zellinger, Werner
Pintor, Maura
Grosse, Kathrin
Koutini, Khaled
Moser, Bernhard A.
Biggio, Battista
Widmer, Gerhard
INFORMATION SCIENCES, 2024, 654
[48] Data Augmentation Can Improve Robustness
Rebuffi, Sylvestre-Alvise
Gowal, Sven
Calian, Dan
Stimberg, Florian
Wiles, Olivia
Mann, Timothy
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[49] Enhancing Text Classification with the Universum
Liu, Chien-Liang
Lee, Ching-Hsien
2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 1147 - 1153
[50] Quantum Text Encoding for Classification Tasks
Alexander, Aaranya
Widdows, Dominic
2022 IEEE/ACM 7TH SYMPOSIUM ON EDGE COMPUTING (SEC 2022), 2022, : 355 - 361

← 1 2 3 4 5 →