3Rs:Data Augmentation Techniques Using Document Contexts For Low-Resource Chinese Named Entity Recognition

被引:0
|
作者
Ying, Zheyu [1 ,2 ]
Zhang, Jinglei [1 ,2 ]
Xie, Rui [1 ]
Wen, Guochang [1 ,2 ]
Xiao, Feng [1 ,2 ]
Liu, Xueyang [1 ]
Zhang, Shikun [1 ]
机构
[1] Peking Univ, Natl Engn Res Ctr Software Engn, Beijing, Peoples R China
[2] Peking Univ, Sch Software & Microelect, Beijing, Peoples R China
关键词
Chinese NER; Data Augmentation; Document-Level; Adversarial Attack; Low-resource;
D O I
10.1109/IJCNN55064.2022.9892341
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With recent advances of neural networks and pre-training techniques, Chinese Named Entity Recognition (NER) has achieved great progress in recent years. However, NER systems still have the problem of generalization ability issues due to lack of annotated data, and current NER models mostly consider input sentences individually, which prevent models from further exploiting cross-sentence document context in training. With regard of these problems, this paper present new insights into Chinese NER and propose 3Rs: three data augmentation methods incorporating document-level information for NER through random concatenating, random swapping and random erasing, which are inspired by some multi-sample data augmentation techniques in computer vision fields, aiming to reorganize the composition of training sentences, and generate more training examples with less human efforts. We conduct extensive experiments on two Chinese datasets, and introduce a two-level attacking method to audit robustness performance. Our experiment results show that even the best model can obtain a better accuracy and robustness, especially for smaller training sets, therefore alleviating performance bottlenecks on low-resource conditions.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Improving Low-resource Named Entity Recognition with Graph Propagated Data Augmentation
    Cai, Jiong
    Huang, Shen
    Jiang, Yong
    Tan, Zeqi
    Xie, Pengjun
    Tu, Kewei
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 110 - 118
  • [2] Exogenous and Endogenous Data Augmentation for Low-Resource Complex Named Entity Recognition
    Zhang, Xinghua
    Chen, Gaode
    Cui, Shiyao
    Sheng, Jiawei
    Liu, Tingwen
    Xu, Hongbo
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 630 - 640
  • [3] Constrained Labeled Data Generation for Low-Resource Named Entity Recognition
    Guo, Ruohao
    Roth, Dan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4519 - 4533
  • [4] AUC Maximization for Low-Resource Named Entity Recognition
    Nguyen, Ngoc Dang
    Tan, Wei
    Du, Lan
    Buntine, Wray
    Beare, Richard
    Chen, Changyou
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13389 - 13399
  • [5] Enhancement of Named Entity Recognition in Low-Resource Languages with Data Augmentation and BERT Models: A Case Study on Urdu
    Ullah, Fida
    Gelbukh, Alexander
    Zamir, Muhammad Tayyab
    Riveron, Edgardo Manuel Felipe
    Sidorov, Grigori
    COMPUTERS, 2024, 13 (10)
  • [6] Data Augmentation for Chinese Clinical Named Entity Recognition
    Wang P.-H.
    Li M.-Z.
    Li S.
    Li, Si (lisi@bupt.edu.cn), 1600, Beijing University of Posts and Telecommunications (43): : 84 - 90
  • [7] Data Augmentation Techniques on Arabic Data for Named Entity Recognition
    Sabty, Caroline
    Omar, Islam
    Wasfalla, Fady
    Islam, Mohamed
    Abdennadher, Slim
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 292 - 299
  • [8] Biomedical Named Entity Recognition Under Low-Resource Situation
    Zhao, Jianfei
    Ren, Xiangyu
    Zhao, Shuo
    Li, Jinyi
    HEALTH INFORMATION PROCESSING. EVALUATION TRACK PAPERS, 2023, 1773 : 41 - 47
  • [9] RoPDA: Robust Prompt -Based Data Augmentation for Low -Resource Named Entity Recognition
    Song, Sihan
    Shen, Furao
    Zhao, Jian
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19017 - 19025
  • [10] Converse Attention Knowledge Transfer for Low-Resource Named Entity Recognition
    School of Computer Science and Technology, University of Science and Technology of China, Hefei
    230027, China
    不详
    639798, Singapore
    Int. J. Crowd. Sci., 2024, 3 (140-148):