PDA: Data Augmentation with Preposition Words on Chinese text classification

被引:0
|
作者
Yang, Leixin [1 ]
Xiong, Haoyu [1 ]
Xiang, Yu [1 ]
机构
[1] Yunnan Normal Univ, Sch Informat Sci & Technol, Kunming, Yunnan, Peoples R China
关键词
Data Augmentation; Model Generalization; Semantic Preservation; Text Classification;
D O I
10.1145/3663976.3664020
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
While previous data augmentation methods have shown good results, some of them have limitations in applicability and complexity. In this paper, we propose a simple Chinese data augmentation technique called Preposition Data Augmentation (PDA). The principle of PDA is straightforward, which involves randomly inserting Chinese prepositions into the original sentences. The augmented samples maintain the same order as the original ones, preserving the semantic information and retaining all the input information. According to our experimental results on Chinese datasets, our PDA method demonstrates a certain advantage over AEDA [5], EDA [13], and back-translation [11] in terms of model generalization. We conducted experiments on various Chinese classification tasks, and the results indicate the advantages of our PDA data augmentation method.
引用
收藏
页数:4
相关论文
共 50 条
  • [41] ADAM: An Attentional Data Augmentation Method for Extreme Multi-label Text Classification
    Zhang, Jiaxin
    Liu, Jie
    Chen, Shaowei
    Lin, Shaoxin
    Wang, Bingquan
    Wang, Shanpeng
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT I, 2022, 13280 : 131 - 142
  • [42] ALP: Data Augmentation Using Lexicalized PCFGs for Few-Shot Text Classification
    Kim, Hazel H.
    Woo, Daecheol
    Oh, Seong Joon
    Cha, Jeong-Won
    Han, Yo-Sub
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10894 - 10902
  • [43] Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning
    Wei, Jason
    Huang, Chengyu
    Vosoughi, Soroush
    Cheng, Yu
    Xu, Shiqi
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 5493 - 5500
  • [44] MEDA: Meta-Learning with Data Augmentation for Few-Shot Text Classification
    Sun, Pengfei
    Ouyang, Yawen
    Zhang, Wenming
    Dai, Xin-yu
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3929 - 3935
  • [45] Compositional Generalization for Multi-Label Text Classification: A Data-Augmentation Approach
    Chai, Yuyang
    Li, Zhuang
    Liu, Jiahui
    Chen, Lei
    Li, Fei
    Ji, Donghong
    Teng, Chong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17727 - 17735
  • [46] Discovering Chinese words from unsegmented text
    Ge, XP
    Pratt, W
    Smyth, P
    SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 271 - 272
  • [47] Words in Pairs Neural Networks for Text Classification
    Wu Yujia
    Li Jing
    Song Chengfang
    Chang Jun
    CHINESE JOURNAL OF ELECTRONICS, 2020, 29 (03) : 491 - 500
  • [48] Automatic text classification using words networks
    Pablo Cardenas, Juan
    Olivares, Gaston
    Alfaro, Rodrigo
    REVISTA SIGNOS, 2014, 47 (86): : 346 - 364
  • [49] Words in Pairs Neural Networks for Text Classification
    WU Yujia
    LI Jing
    SONG Chengfang
    CHANG Jun
    Chinese Journal of Electronics, 2020, 29 (03) : 491 - 500
  • [50] Joint Embedding of Words and Labels for Text Classification
    Wang, Guoyin
    Li, Chunyuan
    Wang, Wenlin
    Zhang, Yizhe
    Shen, Dinghan
    Zhang, Xinyuan
    Henao, Ricardo
    Carin, Lawrence
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2321 - 2331