Probabilistic Interpolation with Mixup Data Augmentation for Text Classification

被引:0
|
作者
Xu, Rongkang [1 ]
Zhang, Yongcheng [1 ]
Ren, Kai [2 ]
Huang, Yu [1 ]
Wei, Xiaomei [1 ]
机构
[1] Huazhong Agr Univ, Coll Informat, Wuhan 430070, Peoples R China
[2] South Cent Minzu Univ, Coll Comp Sci, Wuhan 430074, Peoples R China
关键词
Text Interpolation; Data Augmentation; Probabilistic Interpolation;
D O I
10.1007/978-981-97-5672-8_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Supervised deep learning models often confront the dilemma of insufficient training data, where the Mixup method, as a unique data augmentation technique, addresses this issue of data shortage by interpolating existing samples to generate new synthetic samples. However, most current Mixup methods adopt linear interpolation, which is limited to the generation of synthetic data within the linear range of the sample space, invariably restricting the diversity of synthetic samples. To break this limitation, we introduced an innovative non-linear interpolation technology known as PTMix in this study. PTMix applies interpolation based on random probabilities on each dimension of the feature, significantly enhancing the data augmentation process. Through this approach, we not only expanded the range of the synthetic sample space, increased the diversity of samples, but also ensured a high fidelity to the original data. Based on extensive experiments on five public text classification datasets, PTMix achieves the highest average accuracy to date of 86.64% under full resource conditions and 63.84% under low resource conditions.
引用
收藏
页码:410 / 421
页数:12
相关论文
共 50 条
  • [1] Nonlinear Mixup: Out-Of-Manifold Data Augmentation for Text Classification
    Guo, Hongyu
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 4044 - 4051
  • [2] G-Mixup: Graph Data Augmentation for Graph Classification
    Han, Xiaotian
    Jiang, Zhimeng
    Liu, Ninghao
    Hu, Xia
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [3] MixCode: Enhancing Code Classification by Mixup-Based Data Augmentation
    Dong, Zeming
    Hu, Qiang
    Guo, Yuejun
    Cordy, Maxime
    Papadakis, Mike
    Zhang, Zhenya
    Traon, Yves Le
    Zhao, Jianjun
    [J]. Proceedings - 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2023, 2023, : 379 - 390
  • [4] MIXCODE: Enhancing Code Classification by Mixup-Based Data Augmentation
    Dong, Zeming
    Hu, Qiang
    Guo, Yuejun
    Cordy, Maxime
    Papadakis, Mike
    Zhang, Zhenya
    Le Traon, Yves
    Zhao, Jianjun
    [J]. arXiv, 2022,
  • [5] MIXCODE: Enhancing Code Classification by Mixup-Based Data Augmentation
    Dong, Zeming
    Hu, Qiang
    Guo, Yuejun
    Cordy, Maxime
    Papadakis, Mike
    Zhang, Zhenya
    Le Traon, Yves
    Zhao, Jianjun
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 379 - 390
  • [6] Data Augmentation with Transformers for Text Classification
    Medardo Tapia-Tellez, Jose
    Jair Escalante, Hugo
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2020, PT II, 2020, 12469 : 247 - 259
  • [7] A Survey on Data Augmentation for Text Classification
    Bayer, Markus
    Kaufhold, Marc-Andre
    Reuter, Christian
    [J]. ACM COMPUTING SURVEYS, 2023, 55 (07)
  • [8] Acoustic Scene Classification with Mismatched Devices Using CliqueNets and Mixup Data Augmentation
    Nguyen, Truc
    Pernkopf, Franz
    [J]. INTERSPEECH 2019, 2019, : 2330 - 2334
  • [9] Hierarchical Data Augmentation and the Application in Text Classification
    Yu, Shujuan
    Yang, Jie
    Liu, Danlei
    Li, Runqi
    Zhang, Yun
    Zhao, Shengmei
    [J]. IEEE ACCESS, 2019, 7 : 185476 - 185485
  • [10] AEDA: An Easier Data Augmentation Technique for Text Classification
    Karimi, Akbar
    Rossi, Leonardo
    Prati, Andrea
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2748 - 2754