Enhancing Text Classification Models with Generative AI-aided Data Augmentation

被引:1
|
作者
Zhao, Huanhuan [1 ]
Chen, Haihua [2 ]
Yoon, Hong-Jun [3 ]
机构
[1] Univ Tennessee, Data Sci & Engn, Knoxville, TN 37996 USA
[2] Univ North Texas, Dept Informat Sci, Denton, TX USA
[3] Oak Ridge Natl Lab, Computat Sci & Engn Div, Oak Ridge, TN USA
关键词
text classification; data augmentation; ChatGPT; imbalanced data; natural language processing; machine learning; artificial intelligence;
D O I
10.1109/AITest58265.2023.00030
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study investigated the potential of enhancing the performance of text classification by augmenting the training dataset with external knowledge samples generated by a generative AI, specifically ChatGPT. The study conducted experiments on three models - CNN, HiSAN, and BERT - using the Reuters dataset. First, the study evaluated the effectiveness of incorporating ChatGPT-generated samples and then analyzed the impact of various factors such as sample size, sample variability, and sample length on the models' performance by varying the number, variety, and length of the generated samples. The models were assessed using macro and micro-averaged scores, and the results revealed that the macro-averaged scores improved significantly across all three models, with the BERT model showing the greatest improvement (from 49.87% to 65.73% in macro F1 score). The study further found that adding 30 distinct samples produced better results than adding 6 duplicates of 5 samples, and samples with 150 and 256 words had similar performance, while those with 50 words performed slightly worse. These findings suggest that incorporating external knowledge samples generated by a generative AI is an effective approach to enhance text classification models' performance. The study also highlights that the variability of articles generated by ChatGPT positively impacted the models' accuracy, and longer synthesized texts convey more comprehensive information on the subjects, leading to higher classification accuracy scores. Additionally, we conducted a comparison between our results and those obtained from EDA, a widely used data augmentation generator. The findings clearly demonstrate that our method surpasses EDA and offers additional advantages by reducing computational costs and solving zero-shot problem. Our code is available on GitHub.(1)
引用
收藏
页码:138 / 145
页数:8
相关论文
共 50 条
  • [1] A Generative Adversarial Network for AI-Aided Chair Design
    Liu, Zhibo
    Gao, Feng
    Wang, Yizhou
    [J]. 2019 2ND IEEE CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2019), 2019, : 486 - 490
  • [2] A TASK-DECOMPOSED AI-AIDED APPROACH FOR GENERATIVE CONCEPTUAL DESIGN
    Wang, Boheng
    Zuo, Haoyu
    Cai, Zebin
    Yin, Yuan
    Childs, Peter
    Sun, Lingyun
    Chen, Liuqing
    [J]. PROCEEDINGS OF ASME 2023 INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, IDETC-CIE2023, VOL 6, 2023,
  • [3] Data Augmentation Methods for Enhancing Robustness in Text Classification Tasks
    Tang, Huidong
    Kamei, Sayaka
    Morimoto, Yasuhiko
    [J]. ALGORITHMS, 2023, 16 (01)
  • [4] OpticGAI: Generative AI-aided Deep Reinforcement Learning for Optical Networks Optimization
    Li, Siyuan
    Lin, Xi
    Liu, Yaju
    Li, Gaolei
    Li, Jianhua
    [J]. PROCEEDINGS OF THE 1ST SIGCOMM WORKSHOP ON HOT TOPICS IN OPTICAL TECHNOLOGIES AND APPLICATIONS IN NETWORKING, HOTOPTICS 2024, 2024, : 1 - 6
  • [5] GeoNLPlify: A spatial data augmentation enhancing text classification for crisis monitoring
    Decoupes, Remy
    Roche, Mathieu
    Teisseire, Maguelonne
    [J]. INTELLIGENT DATA ANALYSIS, 2024, 28 (02) : 507 - 531
  • [6] AI-aided Data Mining in Gut Microbiome: The Road to Precision Medicine
    Jiang, Xiaoqing
    Xu, Congmin
    Guo, Qian
    Zhu, Huaiqiu
    [J]. 2021 14TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2021), 2021,
  • [7] Energy-Efficient Resource Allocation in Generative AI-Aided Secure Semantic Mobile Networks
    Zheng, Jie
    Du, Baoxia
    Du, Hongyang
    Kang, Jiawen
    Niyato, Dusit
    Zhang, Haijun
    [J]. IEEE Transactions on Mobile Computing, 2024, 23 (12) : 11422 - 11435
  • [8] Enhanced Data Augmentation for Infrared Images with Generative Adversarial Networks Aided by Pretrained Models
    Wang, Yan
    Deng, Lianbing
    [J]. IEEE Access, 2024, 12 : 176739 - 176750
  • [9] YOLO-Based Semantic Communication With Generative AI-Aided Resource Allocation for Digital Twins Construction
    Du, Baoxia
    Du, Hongyang
    Liu, Haifeng
    Niyato, Dusit
    Xin, Peng
    Yu, Jun
    Qi, Mingyang
    Tang, You
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (05): : 7664 - 7678
  • [10] Data Augmentation with Transformers for Text Classification
    Medardo Tapia-Tellez, Jose
    Jair Escalante, Hugo
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2020, PT II, 2020, 12469 : 247 - 259