Exogenous and Endogenous Data Augmentation for Low-Resource Complex Named Entity Recognition

被引:0
|
作者
Zhang, Xinghua [1 ,2 ]
Chen, Gaode [1 ,2 ]
Cui, Shiyao [1 ,2 ]
Sheng, Jiawei [1 ,2 ]
Liu, Tingwen [1 ,2 ]
Xu, Hongbo [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
关键词
Knowledge Acquisition; Data Augmentation; Named Entity Recognition; Low-resource learning;
D O I
10.1145/3626772.3657754
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Low-resource Complex Named Entity Recognition aims to detect entities with the form of any linguistic constituent under scenarios with limited manually annotated data. Existing studies augment the text through the substitution of same type entities or language modeling, but suffer from the lower quality and the limited entity context patterns within low-resource corpora. In this paper, we propose a novel data augmentation method E(2)DA from both exogenous and endogenous perspectives. As for exogenous augmentation, we treat the limited manually annotated data as anchors, and leverage the powerful instruction-following capabilities of Large Language Models (LLMs) to expand the anchors by generating data that are highly dissimilar from the original anchor texts in terms of entity mentions and contexts. As regards the endogenous augmentation, we explore diverse semantic directions in the implicit feature space of the original and expanded anchors for effective data augmentation. Our complementary augmentation method from two perspectives not only continuously expands the global text-level space, but also fully explores the local semantic space for more diverse data augmentation. Extensive experiments on 10 diverse datasets across various low-resource settings demonstrate that the proposed method excels significantly over prior state-of-the-art data augmentation methods.
引用
收藏
页码:630 / 640
页数:11
相关论文
共 50 条
  • [31] Embedding Transfer for Low-Resource Medical Named Entity Recognition: A Case Study on Patient Mobility
    Newman-Griffis, Denis
    Zirikly, Ayah
    SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2018), 2018, : 1 - 11
  • [32] Semi-supervised Named Entity Recognition for Low-Resource Languages Using Dual PLMs
    Yohannes, Hailemariam Mehari
    Lynden, Steven
    Amagasa, Toshiyuki
    Matono, Akiyoshi
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT I, NLDB 2024, 2024, 14762 : 166 - 180
  • [33] Improving Named Entity Recognition for Social Media with Data Augmentation
    Liu, Wenzhong
    Cui, Xiaohui
    APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [34] Data Augmentation for Cross-Domain Named Entity Recognition
    Chen, Shuguang
    Aguilar, Gustavo
    Neves, Leonardo
    Solorio, Thamar
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5346 - 5356
  • [35] COSINER: COntext SImilarity data augmentation for Named Entity Recognition
    Bartolini, Ilaria
    Moscato, Vincenzo
    Postiglione, Marco
    Sperli, Giancarlo
    Vignali, Andrea
    SIMILARITY SEARCH AND APPLICATIONS (SISAP 2022), 2022, 13590 : 11 - 24
  • [36] Named-Entity Recognition for a Low-resource Language using Pre-Trained Language Model
    Yohannes, Hailemariam Mehari
    Amagasa, Toshiyuki
    37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2022, : 837 - 844
  • [37] Integrating prompt techniques and multi-similarity matching for named entity recognition in low-resource settings
    Yang, Jun
    Yao, Liguo
    Zhang, Taihua
    Tsai, Chieh-Yuan
    Lu, Yao
    Shen, Mingming
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 144
  • [38] ECTTLNER: An Effective Cross-Task Transferring Learning Method for Low-Resource Named Entity Recognition
    Xu, Yiwu
    Chen, Yun
    NEURAL PROCESSING LETTERS, 2025, 57 (01)
  • [39] Generalized Data Augmentation for Low-Resource Translation
    Xia, Mengzhou
    Kong, Xiang
    Anastasopoulos, Antonios
    Neubig, Graham
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5786 - 5796
  • [40] Image-Mediated Data Augmentation for Low-Resource Human Activity Recognition
    Wang, Zihao
    Qu, Youli
    Tao, Junru
    Song, Yudan
    PROCEEDINGS OF THE 2019 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTE AND DATA ANALYSIS (ICCDA 2019), 2019, : 49 - 54