Enhancing Text Classification Models with Generative AI-aided Data Augmentation

被引：1

作者：

Zhao, Huanhuan ^{[1
]}

Chen, Haihua ^{[2
]}

Yoon, Hong-Jun ^{[3
]}

机构：

[1] Univ Tennessee, Data Sci & Engn, Knoxville, TN 37996 USA

[2] Univ North Texas, Dept Informat Sci, Denton, TX USA

[3] Oak Ridge Natl Lab, Computat Sci & Engn Div, Oak Ridge, TN USA

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING, AITEST | 2023年

关键词：

text classification; data augmentation; ChatGPT; imbalanced data; natural language processing; machine learning; artificial intelligence;

D O I：

10.1109/AITest58265.2023.00030

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This study investigated the potential of enhancing the performance of text classification by augmenting the training dataset with external knowledge samples generated by a generative AI, specifically ChatGPT. The study conducted experiments on three models - CNN, HiSAN, and BERT - using the Reuters dataset. First, the study evaluated the effectiveness of incorporating ChatGPT-generated samples and then analyzed the impact of various factors such as sample size, sample variability, and sample length on the models' performance by varying the number, variety, and length of the generated samples. The models were assessed using macro and micro-averaged scores, and the results revealed that the macro-averaged scores improved significantly across all three models, with the BERT model showing the greatest improvement (from 49.87% to 65.73% in macro F1 score). The study further found that adding 30 distinct samples produced better results than adding 6 duplicates of 5 samples, and samples with 150 and 256 words had similar performance, while those with 50 words performed slightly worse. These findings suggest that incorporating external knowledge samples generated by a generative AI is an effective approach to enhance text classification models' performance. The study also highlights that the variability of articles generated by ChatGPT positively impacted the models' accuracy, and longer synthesized texts convey more comprehensive information on the subjects, leading to higher classification accuracy scores. Additionally, we conducted a comparison between our results and those obtained from EDA, a widely used data augmentation generator. The findings clearly demonstrate that our method surpasses EDA and offers additional advantages by reducing computational costs and solving zero-shot problem. Our code is available on GitHub.(1)

引用

页码：138 / 145

页数：8

共 50 条

[1] A Generative Adversarial Network for AI-Aided Chair Design
Liu, Zhibo
Gao, Feng
Wang, Yizhou
2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 486 - 490
[2] A TASK-DECOMPOSED AI-AIDED APPROACH FOR GENERATIVE CONCEPTUAL DESIGN
Wang, Boheng
Zuo, Haoyu
Cai, Zebin
Yin, Yuan
Childs, Peter
Sun, Lingyun
Chen, Liuqing
PROCEEDINGS OF ASME 2023 INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, IDETC-CIE2023, VOL 6, 2023,
[3] Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks
Tang, Huidong
Kamei, Sayaka
Morimoto, Yasuhiko
ALGORITHMS, 2023, 16 (01)
[4] OpticGAI: Generative AI-aided Deep Reinforcement Learning for Optical Networks Optimization
Li, Siyuan
Lin, Xi
Liu, Yaju
Li, Gaolei
Li, Jianhua
PROCEEDINGS OF THE 1ST SIGCOMM WORKSHOP ON HOT TOPICS IN OPTICAL TECHNOLOGIES AND APPLICATIONS IN NETWORKING, HOTOPTICS 2024, 2024, : 1 - 6
[5] GeoNLPlify: A spatial data augmentation enhancing text classification for crisis monitoring
Decoupes, Remy
Roche, Mathieu
Teisseire, Maguelonne
INTELLIGENT DATA ANALYSIS, 2024, 28 (02) : 507 - 531
[6] AI-aided Data Mining in Gut Microbiome: The Road to Precision Medicine
Jiang, Xiaoqing
Xu, Congmin
Guo, Qian
Zhu, Huaiqiu
2021 14TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2021), 2021,
[7] Enhanced Data Augmentation for Infrared Images With Generative Adversarial Networks Aided by Pretrained Models
Wang, Yan
Deng, Lianbing
IEEE ACCESS, 2024, 12 : 176739 - 176750
[8] Energy-Efficient Resource Allocation in Generative AI-Aided Secure Semantic Mobile Networks
Zheng, Jie
Du, Baoxia
Du, Hongyang
Kang, Jiawen
Niyato, Dusit
Zhang, Haijun
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) : 11422 - 11435
[9] Data Augmentation with Transformers for Text Classification
Medardo Tapia-Tellez, Jose
Jair Escalante, Hugo
ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2020, PT II, 2020, 12469 : 247 - 259
[10] A Survey on Data Augmentation for Text Classification
Bayer, Markus
Kaufhold, Marc-Andre
Reuter, Christian
ACM COMPUTING SURVEYS, 2023, 55 (07)

← 1 2 3 4 5 →