3Rs:Data Augmentation Techniques Using Document Contexts For Low-Resource Chinese Named Entity Recognition

被引:0
|
作者
Ying, Zheyu [1 ,2 ]
Zhang, Jinglei [1 ,2 ]
Xie, Rui [1 ]
Wen, Guochang [1 ,2 ]
Xiao, Feng [1 ,2 ]
Liu, Xueyang [1 ]
Zhang, Shikun [1 ]
机构
[1] Peking Univ, Natl Engn Res Ctr Software Engn, Beijing, Peoples R China
[2] Peking Univ, Sch Software & Microelect, Beijing, Peoples R China
关键词
Chinese NER; Data Augmentation; Document-Level; Adversarial Attack; Low-resource;
D O I
10.1109/IJCNN55064.2022.9892341
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With recent advances of neural networks and pre-training techniques, Chinese Named Entity Recognition (NER) has achieved great progress in recent years. However, NER systems still have the problem of generalization ability issues due to lack of annotated data, and current NER models mostly consider input sentences individually, which prevent models from further exploiting cross-sentence document context in training. With regard of these problems, this paper present new insights into Chinese NER and propose 3Rs: three data augmentation methods incorporating document-level information for NER through random concatenating, random swapping and random erasing, which are inspired by some multi-sample data augmentation techniques in computer vision fields, aiming to reorganize the composition of training sentences, and generate more training examples with less human efforts. We conduct extensive experiments on two Chinese datasets, and introduce a two-level attacking method to audit robustness performance. Our experiment results show that even the best model can obtain a better accuracy and robustness, especially for smaller training sets, therefore alleviating performance bottlenecks on low-resource conditions.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Low-Resource Named Entity Recognition via the Pre-Training Model
    Chen, Siqi
    Pei, Yijie
    Ke, Zunwang
    Silamu, Wushour
    SYMMETRY-BASEL, 2021, 13 (05):
  • [22] Self-Training With Double Selectors for Low-Resource Named Entity Recognition
    Fu, Yingwen
    Lin, Nankai
    Yu, Xiaohui
    Jiang, Shengyi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1265 - 1275
  • [23] Named Entity Recognition with Word Embeddings and Wikipedia Categories for a Low-Resource Language
    Das, Arjun
    Ganguly, Debasis
    Garain, Utpal
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2017, 16 (03)
  • [24] On the scalability of data augmentation techniques for low-resource machine translation between Chinese and Vietnamese
    Vu, Huan
    Bui, Ngoc Dung
    JOURNAL OF INFORMATION AND TELECOMMUNICATION, 2023, 7 (02) : 241 - 253
  • [25] A Framework of Data Augmentation While Active Learning for Chinese Named Entity Recognition
    Li, Qingqing
    Huang, Zhen
    Dou, Yong
    Zhang, Ziwen
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2021, PT II, 2021, 12816 : 88 - 100
  • [26] Named-Entity Recognition for a Low-resource Language using Pre-Trained Language Model
    Yohannes, Hailemariam Mehari
    Amagasa, Toshiyuki
    37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2022, : 837 - 844
  • [27] Combining rule-based and statistical mechanisms for low-resource named entity recognition
    Gabbard, Ryan
    DeYoung, Jay
    Lignos, Constantine
    Freedman, Marjorie
    Weischedel, Ralph
    MACHINE TRANSLATION, 2018, 32 (1-2) : 31 - 43
  • [28] A Comparative Study of Pre-trained Encoders for Low-Resource Named Entity Recognition
    Chen, Yuxuan
    Mikkelsen, Jonas
    Binder, Arne
    Alt, Christoph
    Hennig, Leonhard
    PROCEEDINGS OF THE 7TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP, 2022, : 46 - 59
  • [29] MIXSPEECH: DATA AUGMENTATION FOR LOW-RESOURCE AUTOMATIC SPEECH RECOGNITION
    Meng, Linghui
    Xu, Jin
    Tan, Xu
    Wang, Jindong
    Qin, Tao
    Xu, Bo
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7008 - 7012
  • [30] Robust and Informative Text Augmentation (RITA) via Constrained Worst-Case Transformations for Low-Resource Named Entity Recognition
    Sohn, Hyunwoo
    Park, Baekkwan
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 1616 - 1624