PDA: Data Augmentation with Preposition Words on Chinese text classification

被引:0
|
作者
Yang, Leixin [1 ]
Xiong, Haoyu [1 ]
Xiang, Yu [1 ]
机构
[1] Yunnan Normal Univ, Sch Informat Sci & Technol, Kunming, Yunnan, Peoples R China
关键词
Data Augmentation; Model Generalization; Semantic Preservation; Text Classification;
D O I
10.1145/3663976.3664020
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
While previous data augmentation methods have shown good results, some of them have limitations in applicability and complexity. In this paper, we propose a simple Chinese data augmentation technique called Preposition Data Augmentation (PDA). The principle of PDA is straightforward, which involves randomly inserting Chinese prepositions into the original sentences. The augmented samples maintain the same order as the original ones, preserving the semantic information and retaining all the input information. According to our experimental results on Chinese datasets, our PDA method demonstrates a certain advantage over AEDA [5], EDA [13], and back-translation [11] in terms of model generalization. We conducted experiments on various Chinese classification tasks, and the results indicate the advantages of our PDA data augmentation method.
引用
收藏
页数:4
相关论文
共 50 条
  • [21] Text Data Augmentation Techniques for Word Embeddings in Fake News Classification
    Kapusta, Jozef
    Drzik, David
    Steflovic, Kirsten
    Nagy, Kitti Szabo
    IEEE ACCESS, 2024, 12 : 31538 - 31550
  • [22] Nonlinear Mixup: Out-Of-Manifold Data Augmentation for Text Classification
    Guo, Hongyu
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 4044 - 4051
  • [23] GeoNLPlify: A spatial data augmentation enhancing text classification for crisis monitoring
    Decoupes, Remy
    Roche, Mathieu
    Teisseire, Maguelonne
    INTELLIGENT DATA ANALYSIS, 2024, 28 (02) : 507 - 531
  • [24] Explainable Text Classification via Attentive and Targeted Mixing Data Augmentation
    Jiang, Songhao
    Chu, Yan
    Wang, Zhengkui
    Ma, Tianxing
    Wang, Hanlin
    Lu, Wenxuan
    Zang, Tianning
    Wang, Bo
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5085 - 5094
  • [25] Heavy-tailed Representations, Text Polarity Classification & Data Augmentation
    Jalalzai, Hamid
    Colombo, Pierre
    Clavel, Chloe
    Gaussier, Eric
    Varni, Giovanna
    Vignon, Emmanuel
    Sabourin, Anne
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [26] A Hybrid Classification Method via Character Embedding in Chinese Short Text With Few Words
    Zhu, Yi
    Li, Yun
    Yue, Yongzheng
    Qiang, Jipeng
    Yuan, Yunhao
    IEEE ACCESS, 2020, 8 : 92120 - 92128
  • [27] Enhancing Text Classification Models with Generative AI-aided Data Augmentation
    Zhao, Huanhuan
    Chen, Haihua
    Yoon, Hong-Jun
    2023 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING, AITEST, 2023, : 138 - 145
  • [28] Data augmentation using virtual word insertion techniques in text classification tasks
    Long, Zhigao
    Li, Hong
    Shi, Jiawen
    Ma, Xin
    EXPERT SYSTEMS, 2024, 41 (04)
  • [29] Iterative Translation-Based Data Augmentation Method for Text Classification Tasks
    Lee, Sangwon
    Liu, Ling
    Choi, Wonik
    IEEE ACCESS, 2021, 9 : 160437 - 160445
  • [30] EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks
    Wei, Jason
    Zou, Kai
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 6382 - 6388