3Rs:Data Augmentation Techniques Using Document Contexts For Low-Resource Chinese Named Entity Recognition

被引:0
|
作者
Ying, Zheyu [1 ,2 ]
Zhang, Jinglei [1 ,2 ]
Xie, Rui [1 ]
Wen, Guochang [1 ,2 ]
Xiao, Feng [1 ,2 ]
Liu, Xueyang [1 ]
Zhang, Shikun [1 ]
机构
[1] Peking Univ, Natl Engn Res Ctr Software Engn, Beijing, Peoples R China
[2] Peking Univ, Sch Software & Microelect, Beijing, Peoples R China
关键词
Chinese NER; Data Augmentation; Document-Level; Adversarial Attack; Low-resource;
D O I
10.1109/IJCNN55064.2022.9892341
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With recent advances of neural networks and pre-training techniques, Chinese Named Entity Recognition (NER) has achieved great progress in recent years. However, NER systems still have the problem of generalization ability issues due to lack of annotated data, and current NER models mostly consider input sentences individually, which prevent models from further exploiting cross-sentence document context in training. With regard of these problems, this paper present new insights into Chinese NER and propose 3Rs: three data augmentation methods incorporating document-level information for NER through random concatenating, random swapping and random erasing, which are inspired by some multi-sample data augmentation techniques in computer vision fields, aiming to reorganize the composition of training sentences, and generate more training examples with less human efforts. We conduct extensive experiments on two Chinese datasets, and introduce a two-level attacking method to audit robustness performance. Our experiment results show that even the best model can obtain a better accuracy and robustness, especially for smaller training sets, therefore alleviating performance bottlenecks on low-resource conditions.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation
    Bartelds, Martijn
    San, Nay
    McDonnell, Bradley
    Jurafsky, Dan
    Wieling, Martijn
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 715 - 729
  • [42] Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning
    He, Hang
    Ma, Chao
    Ye, Shan
    Tang, Wenqiang
    Zhou, Yuxuan
    Yu, Zhen
    Yi, Jiaxin
    Hou, Li
    Hou, Mingcai
    JOURNAL OF EARTH SCIENCE, 2024, 35 (03) : 1035 - 1043
  • [43] Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning
    Hang He
    Chao Ma
    Shan Ye
    Wenqiang Tang
    Yuxuan Zhou
    Zhen Yu
    Jiaxin Yi
    Li Hou
    Mingcai Hou
    Journal of Earth Science, 2024, 35 (03) : 1035 - 1043
  • [44] ECTTLNER: An Effective Cross-Task Transferring Learning Method for Low-Resource Named Entity Recognition
    Xu, Yiwu
    Chen, Yun
    NEURAL PROCESSING LETTERS, 2025, 57 (01)
  • [45] Image-Mediated Data Augmentation for Low-Resource Human Activity Recognition
    Wang, Zihao
    Qu, Youli
    Tao, Junru
    Song, Yudan
    PROCEEDINGS OF THE 2019 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTE AND DATA ANALYSIS (ICCDA 2019), 2019, : 49 - 54
  • [46] Textual data augmentation using generative approaches - Impact on named entity recognition tasks
    Cao, Danrun
    Bechet, Nicolas
    Marteau, Pierre-Francois
    Ahmia, Oussama
    DATA & KNOWLEDGE ENGINEERING, 2025, 156
  • [47] Enhancing Low-resource Fine-grained Named Entity Recognition by Leveraging Coarse-grained Datasets
    Lee, Su Ah
    Oh, Seokjin
    Jung, Woohwan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3269 - 3279
  • [48] LOW-RESOURCE EXPRESSIVE TEXT-TO-SPEECH USING DATA AUGMENTATION
    Huybrechts, Goeric
    Merritt, Thomas
    Comini, Giulia
    Perz, Bartek
    Shah, Raahil
    Lorenzo-Trueba, Jaime
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6593 - 6597
  • [49] Low-Resource Language Discrimination toward Chinese Dialects with Transfer Learning and Data Augmentation
    Xu, Fan
    Dan, Yangjie
    Yan, Keyu
    Ma, Yong
    Wang, Mingwen
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (02)
  • [50] Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer
    Feng, Xiaocheng
    Feng, Xiachong
    Qin, Bing
    Feng, Zhangyin
    Liu, Ting
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4071 - 4077