Hierarchical Data Augmentation and the Application in Text Classification

被引:14
|
作者
Yu, Shujuan [1 ]
Yang, Jie [1 ]
Liu, Danlei [1 ]
Li, Runqi [1 ]
Zhang, Yun [1 ]
Zhao, Shengmei [2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Elect & Opt Engn, Nanjing 210023, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Coll Telecommun & Informat Engn, Nanjing 210003, Peoples R China
来源
IEEE ACCESS | 2019年 / 7卷
基金
中国国家自然科学基金;
关键词
Attention mechanism; data augmentation; natural language processing; text classification;
D O I
10.1109/ACCESS.2019.2960263
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The applications of data augmentation in natural language processing have been limited. In this paper, we propose a novel method named Hierarchical Data Augmentation (HDA) which applied for text classification. Firstly, inspired by the hierarchical structure of texts, as words form a sentence and sentences form a document, HDA implements a hierarchical data augmentation strategy by augmenting texts at word-level and sentence level respectively. Secondly, inspired by the cropping, a popular method of data augmentation in computer vision, at each augmenting level, RDA utilizes attention mechanism to distill (crop) important contents from texts hierarchically as summaries of texts. Specifically, we use a trained Hierarchical Attention Networks (HAN) model to obtain attention values of all documents in training sets at both levels respectively, which are further used to extract the most important part of words/sentences and generate new samples by concatenating them in order. Then we gain two levels of augmented datasets, WordSet and SentSet. Finally, extending training set with certain amount of HDA-generated samples and we evaluate models' performance with new training set. The results reveal HDA can generate massive and high-quality augmented samples at both levels, and models using these samples can obtain significant improvements. Compared with the existing methods, HDA enjoys the simplicity both on theory and implementation, and it can augment texts at two levels for the diversity of data.
引用
收藏
页码:185476 / 185485
页数:10
相关论文
共 50 条
  • [31] Data Augmentation Using Transformers and Similarity Measures for Improving Arabic Text Classification
    Refai, Dania
    Abu-Soud, Saleh
    Abdel-Rahman, Mohammad J.
    [J]. IEEE ACCESS, 2023, 11 : 132516 - 132531
  • [32] Data augmentation strategies to improve text classification: a use case in smart cities
    Bencke, Luciana
    Moreira, Viviane Pereira
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2023,
  • [33] Improving Text Classification with Large Language Model-Based Data Augmentation
    Zhao, Huanhuan
    Chen, Haihua
    Ruggles, Thomas A.
    Feng, Yunhe
    Singh, Debjani
    Yoon, Hong-Jun
    [J]. ELECTRONICS, 2024, 13 (13)
  • [34] CHARCNN-SVM FOR CHINESE TEXT DATASETS SENTIMENT CLASSIFICATION WITH DATA AUGMENTATION
    Wang, Xingkai
    Sheng, Yiqiang
    Deng, Haojiang
    Zhao, Zhenyu
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2019, 15 (01): : 227 - 246
  • [35] Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification
    Ren, Shuhuai
    Zhang, Jinchao
    Li, Lei
    Sun, Xu
    Zhou, Jie
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9029 - 9043
  • [36] Hierarchical Label Generation for Text Classification
    Kwon, Jingun
    Kamigaito, Hidetaka
    Song, Young-In
    Okumura, Manabu
    [J]. 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 625 - 632
  • [37] Hierarchical text classification methods and their specification
    Sun, AX
    Lim, EP
    Ng, WK
    [J]. COOPERATIVE INTERNET COMPUTING, 2003, 729 : 236 - 256
  • [38] Hierarchical Interpretation of Neural Text Classification
    Yan, Hanqi
    Gui, Lin
    He, Yulan
    [J]. COMPUTATIONAL LINGUISTICS, 2022, 48 (04) : 987 - 1020
  • [39] Context Recognition for Hierarchical Text Classification
    Liu, Rey-Long
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2009, 60 (04): : 803 - 813
  • [40] Hierarchical Text Classification Incremental Learning
    Song, Shengli
    Qiao, Xiaofei
    Chen, Ping
    [J]. NEURAL INFORMATION PROCESSING, PT 1, PROCEEDINGS, 2009, 5863 : 247 - 258