Exogenous and Endogenous Data Augmentation for Low-Resource Complex Named Entity Recognition

被引:0
|
作者
Zhang, Xinghua [1 ,2 ]
Chen, Gaode [1 ,2 ]
Cui, Shiyao [1 ,2 ]
Sheng, Jiawei [1 ,2 ]
Liu, Tingwen [1 ,2 ]
Xu, Hongbo [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
关键词
Knowledge Acquisition; Data Augmentation; Named Entity Recognition; Low-resource learning;
D O I
10.1145/3626772.3657754
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Low-resource Complex Named Entity Recognition aims to detect entities with the form of any linguistic constituent under scenarios with limited manually annotated data. Existing studies augment the text through the substitution of same type entities or language modeling, but suffer from the lower quality and the limited entity context patterns within low-resource corpora. In this paper, we propose a novel data augmentation method E(2)DA from both exogenous and endogenous perspectives. As for exogenous augmentation, we treat the limited manually annotated data as anchors, and leverage the powerful instruction-following capabilities of Large Language Models (LLMs) to expand the anchors by generating data that are highly dissimilar from the original anchor texts in terms of entity mentions and contexts. As regards the endogenous augmentation, we explore diverse semantic directions in the implicit feature space of the original and expanded anchors for effective data augmentation. Our complementary augmentation method from two perspectives not only continuously expands the global text-level space, but also fully explores the local semantic space for more diverse data augmentation. Extensive experiments on 10 diverse datasets across various low-resource settings demonstrate that the proposed method excels significantly over prior state-of-the-art data augmentation methods.
引用
收藏
页码:630 / 640
页数:11
相关论文
共 50 条
  • [21] Robust and Informative Text Augmentation (RITA) via Constrained Worst-Case Transformations for Low-Resource Named Entity Recognition
    Sohn, Hyunwoo
    Park, Baekkwan
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 1616 - 1624
  • [22] Correction to: Novel data augmentation for named entity recognition
    Aluru V. N. M. Hemateja
    Gopikrishnan Kondakath
    Susruta Das
    Mohanaprasad Kothandaraman
    S. Shoba
    Abhishek Pandey
    Rajin Babu
    Abhinav Jain
    International Journal of Speech Technology, 2023, 26 (4) : 879 - 879
  • [23] Data Augmentation for Chinese Clinical Named Entity Recognition
    Wang P.-H.
    Li M.-Z.
    Li S.
    Li, Si (lisi@bupt.edu.cn), 1600, Beijing University of Posts and Telecommunications (43): : 84 - 90
  • [24] Data Augmentation Techniques on Arabic Data for Named Entity Recognition
    Sabty, Caroline
    Omar, Islam
    Wasfalla, Fady
    Islam, Mohamed
    Abdennadher, Slim
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 292 - 299
  • [25] MELM: Data Augmentation with Masked Entity Language Modeling for Low-Resource NER
    Zhou, Ran
    Li, Xin
    He, Ruidan
    Bing, Lidong
    Cambria, Erik
    Si, Luo
    Miao, Chunyan
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2251 - 2262
  • [26] A multimodal approach for few-shot biomedical named entity recognition in low-resource languages
    Chen, Jian
    Su, Leilei
    Li, Yihong
    Lin, Mingquan
    Peng, Yifan
    Sun, Cong
    JOURNAL OF BIOMEDICAL INFORMATICS, 2025, 161
  • [27] DualNER: A Trigger-Based Dual Learning Framework for Low-Resource Named Entity Recognition
    Zhong, Maosheng
    Liu, GanLin
    Xiong, Jian
    Zuo, Jiali
    IEEE INTELLIGENT SYSTEMS, 2022, 37 (04) : 79 - 87
  • [28] A Low-Resource Named Entity Recognition Method for Cultural Heritage Field Incorporating Knowledge Fusion
    Li C.
    Hou X.
    Qiao X.
    Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2024, 60 (01): : 13 - 22
  • [29] CoTea: Collaborative teaching for low-resource named entity recognition with a divide-and-conquer strategy
    Yang, Zhiwei
    Ma, Jing
    Yang, Kang
    Lin, Huiru
    Chen, Hechang
    Yang, Ruichao
    Chang, Yi
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (03)
  • [30] MAST-NER: A Low-Resource Named Entity Recognition Method Based on Trigger Pool
    Xu, Juxiong
    Li, Minbo
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2022, PT III, 2022, 13370 : 65 - 76