Enhancing Text Classification Models with Generative AI-aided Data Augmentation

被引:1
|
作者
Zhao, Huanhuan [1 ]
Chen, Haihua [2 ]
Yoon, Hong-Jun [3 ]
机构
[1] Univ Tennessee, Data Sci & Engn, Knoxville, TN 37996 USA
[2] Univ North Texas, Dept Informat Sci, Denton, TX USA
[3] Oak Ridge Natl Lab, Computat Sci & Engn Div, Oak Ridge, TN USA
关键词
text classification; data augmentation; ChatGPT; imbalanced data; natural language processing; machine learning; artificial intelligence;
D O I
10.1109/AITest58265.2023.00030
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study investigated the potential of enhancing the performance of text classification by augmenting the training dataset with external knowledge samples generated by a generative AI, specifically ChatGPT. The study conducted experiments on three models - CNN, HiSAN, and BERT - using the Reuters dataset. First, the study evaluated the effectiveness of incorporating ChatGPT-generated samples and then analyzed the impact of various factors such as sample size, sample variability, and sample length on the models' performance by varying the number, variety, and length of the generated samples. The models were assessed using macro and micro-averaged scores, and the results revealed that the macro-averaged scores improved significantly across all three models, with the BERT model showing the greatest improvement (from 49.87% to 65.73% in macro F1 score). The study further found that adding 30 distinct samples produced better results than adding 6 duplicates of 5 samples, and samples with 150 and 256 words had similar performance, while those with 50 words performed slightly worse. These findings suggest that incorporating external knowledge samples generated by a generative AI is an effective approach to enhance text classification models' performance. The study also highlights that the variability of articles generated by ChatGPT positively impacted the models' accuracy, and longer synthesized texts convey more comprehensive information on the subjects, leading to higher classification accuracy scores. Additionally, we conducted a comparison between our results and those obtained from EDA, a widely used data augmentation generator. The findings clearly demonstrate that our method surpasses EDA and offers additional advantages by reducing computational costs and solving zero-shot problem. Our code is available on GitHub.(1)
引用
收藏
页码:138 / 145
页数:8
相关论文
共 50 条
  • [1] A Generative Adversarial Network for AI-Aided Chair Design
    Liu, Zhibo
    Gao, Feng
    Wang, Yizhou
    2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 486 - 490
  • [2] A TASK-DECOMPOSED AI-AIDED APPROACH FOR GENERATIVE CONCEPTUAL DESIGN
    Wang, Boheng
    Zuo, Haoyu
    Cai, Zebin
    Yin, Yuan
    Childs, Peter
    Sun, Lingyun
    Chen, Liuqing
    PROCEEDINGS OF ASME 2023 INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, IDETC-CIE2023, VOL 6, 2023,
  • [3] Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks
    Tang, Huidong
    Kamei, Sayaka
    Morimoto, Yasuhiko
    ALGORITHMS, 2023, 16 (01)
  • [4] OpticGAI: Generative AI-aided Deep Reinforcement Learning for Optical Networks Optimization
    Li, Siyuan
    Lin, Xi
    Liu, Yaju
    Li, Gaolei
    Li, Jianhua
    PROCEEDINGS OF THE 1ST SIGCOMM WORKSHOP ON HOT TOPICS IN OPTICAL TECHNOLOGIES AND APPLICATIONS IN NETWORKING, HOTOPTICS 2024, 2024, : 1 - 6
  • [5] GeoNLPlify: A spatial data augmentation enhancing text classification for crisis monitoring
    Decoupes, Remy
    Roche, Mathieu
    Teisseire, Maguelonne
    INTELLIGENT DATA ANALYSIS, 2024, 28 (02) : 507 - 531
  • [6] AI-aided Data Mining in Gut Microbiome: The Road to Precision Medicine
    Jiang, Xiaoqing
    Xu, Congmin
    Guo, Qian
    Zhu, Huaiqiu
    2021 14TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2021), 2021,
  • [7] Enhanced Data Augmentation for Infrared Images With Generative Adversarial Networks Aided by Pretrained Models
    Wang, Yan
    Deng, Lianbing
    IEEE ACCESS, 2024, 12 : 176739 - 176750
  • [8] Energy-Efficient Resource Allocation in Generative AI-Aided Secure Semantic Mobile Networks
    Zheng, Jie
    Du, Baoxia
    Du, Hongyang
    Kang, Jiawen
    Niyato, Dusit
    Zhang, Haijun
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) : 11422 - 11435
  • [9] Data Augmentation with Transformers for Text Classification
    Medardo Tapia-Tellez, Jose
    Jair Escalante, Hugo
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2020, PT II, 2020, 12469 : 247 - 259
  • [10] A Survey on Data Augmentation for Text Classification
    Bayer, Markus
    Kaufhold, Marc-Andre
    Reuter, Christian
    ACM COMPUTING SURVEYS, 2023, 55 (07)