PDA: Data Augmentation with Preposition Words on Chinese text classification

被引:0
|
作者
Yang, Leixin [1 ]
Xiong, Haoyu [1 ]
Xiang, Yu [1 ]
机构
[1] Yunnan Normal Univ, Sch Informat Sci & Technol, Kunming, Yunnan, Peoples R China
关键词
Data Augmentation; Model Generalization; Semantic Preservation; Text Classification;
D O I
10.1145/3663976.3664020
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
While previous data augmentation methods have shown good results, some of them have limitations in applicability and complexity. In this paper, we propose a simple Chinese data augmentation technique called Preposition Data Augmentation (PDA). The principle of PDA is straightforward, which involves randomly inserting Chinese prepositions into the original sentences. The augmented samples maintain the same order as the original ones, preserving the semantic information and retaining all the input information. According to our experimental results on Chinese datasets, our PDA method demonstrates a certain advantage over AEDA [5], EDA [13], and back-translation [11] in terms of model generalization. We conducted experiments on various Chinese classification tasks, and the results indicate the advantages of our PDA data augmentation method.
引用
收藏
页数:4
相关论文
共 50 条
  • [1] Data Augmentation with Transformers for Text Classification
    Medardo Tapia-Tellez, Jose
    Jair Escalante, Hugo
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2020, PT II, 2020, 12469 : 247 - 259
  • [2] A Survey on Data Augmentation for Text Classification
    Bayer, Markus
    Kaufhold, Marc-Andre
    Reuter, Christian
    ACM COMPUTING SURVEYS, 2023, 55 (07)
  • [3] CHARCNN-SVM FOR CHINESE TEXT DATASETS SENTIMENT CLASSIFICATION WITH DATA AUGMENTATION
    Wang, Xingkai
    Sheng, Yiqiang
    Deng, Haojiang
    Zhao, Zhenyu
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2019, 15 (01): : 227 - 246
  • [4] MPCNN with Knowledge Augmentation: A Model for Chinese Text Classification
    Zhang, Xiaozeng
    Fang, Ailian
    INTELLIGENT COMPUTING METHODOLOGIES, PT III, 2022, 13395 : 141 - 149
  • [5] Hierarchical Data Augmentation and the Application in Text Classification
    Yu, Shujuan
    Yang, Jie
    Liu, Danlei
    Li, Runqi
    Zhang, Yun
    Zhao, Shengmei
    IEEE ACCESS, 2019, 7 : 185476 - 185485
  • [6] Probabilistic Interpolation with Mixup Data Augmentation for Text Classification
    Xu, Rongkang
    Zhang, Yongcheng
    Ren, Kai
    Huang, Yu
    Wei, Xiaomei
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024, 2024, 14878 : 410 - 421
  • [7] AEDA: An Easier Data Augmentation Technique for Text Classification
    Karimi, Akbar
    Rossi, Leonardo
    Prati, Andrea
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2748 - 2754
  • [8] Tokenization-based data augmentation for text classification
    Prakrankamanant, Patawee
    Chuangsuwanich, Ekapol
    2022 19TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE 2022), 2022,
  • [9] Text Smoothing: Enhance Various Data Augmentation Methods on Text Classification Tasks
    Wu, Xing
    Gao, Chaochen
    Lin, Meng
    Zang, Liangjun
    Hu, Songlin
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, 2022, : 871 - 875
  • [10] LiDA: Language-Independent Data Augmentation for Text Classification
    Sujana, Yudianto
    Kao, Hung-Yu
    IEEE ACCESS, 2023, 11 : 10894 - 10901