CoTea: Collaborative teaching for low-resource named entity recognition with a divide-and-conquer strategy

被引:0
|
作者
Yang, Zhiwei [1 ,2 ]
Ma, Jing [3 ]
Yang, Kang [4 ]
Lin, Huiru [5 ]
Chen, Hechang [4 ]
Yang, Ruichao [3 ]
Chang, Yi [4 ,6 ]
机构
[1] Jinan Univ, Guangdong Inst Smart Educ, Guangzhou, Peoples R China
[2] Jilin Univ, Coll Comp Sci & Technol, Changchun, Peoples R China
[3] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Peoples R China
[4] Jilin Univ, Sch Artificial Intelligence, Changchun, Peoples R China
[5] Jinan Univ, Inst Phys Educ, Guangzhou, Peoples R China
[6] Jilin Univ, Int Ctr Future Sci, Changchun, Peoples R China
基金
中国国家自然科学基金;
关键词
Low resource; Named entity recognition; Collaborative teaching; Divide-and-conquer;
D O I
10.1016/j.ipm.2024.103657
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Low -resource named entity recognition (NER) aims to identify entity mentions when training data is scarce. Recent approaches resort to distant data with manual dictionaries for improvement, but such dictionaries are not always available for the target domain and have limited coverage of entities, which may introduce noise. In this paper, we propose a novel Collaborative Teaching (CoTea) framework for low -resource NER with a few supporting labeled examples, which can automatically augment training data and reduce label noise. Specifically, CoTea utilizes the entities in the supporting labeled examples to retrieve entity -related unlabeled data heuristically and then generates accurate distant labels with a novel mining -refining iterative mechanism. For optimizing distant labels, the mechanism mines potential entities from non -entity tokens with a recognition teacher and then refines entity labels with another prompt -based discrimination teacher in a divide -and -conquer manner. Experimental results on two benchmark datasets demonstrate that CoTea outperforms state-of-the-art baselines in lowresource settings and achieves 85% and 65% performance levels of the best high -resource baseline methods by merely utilizing about 2% of labeled data.
引用
收藏
页数:17
相关论文
共 48 条
  • [31] Improving the Teaching of Hypothesis Testing Using a Divide-and-Conquer Strategy and Content Exposure Control in a Gamified Environment
    Delgado-Gomez, David
    Gonzalez-Landero, Franks
    Montes-Botella, Carlos
    Sujar, Aaron
    Bayona, Sofia
    Martino, Luca
    MATHEMATICS, 2020, 8 (12) : 1 - 14
  • [32] Language inference-based learning for Low-Resource Chinese clinical named entity recognition using language model
    Cui, Zhaojian
    Yu, Kai
    Yuan, Zhenming
    Dong, Xiaofeng
    Luo, Weibin
    JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 149
  • [33] 3Rs:Data Augmentation Techniques Using Document Contexts For Low-Resource Chinese Named Entity Recognition
    Ying, Zheyu
    Zhang, Jinglei
    Xie, Rui
    Wen, Guochang
    Xiao, Feng
    Liu, Xueyang
    Zhang, Shikun
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [34] Improving Low-Resource Chinese Named Entity Recognition Using Bidirectional Encoder Representation from Transformers and Lexicon Adapter
    Dang, Xiaochao
    Wang, Li
    Dong, Xiaohui
    Li, Fenfang
    Deng, Han
    APPLIED SCIENCES-BASEL, 2023, 13 (19):
  • [35] Unsupervised Paraphrasing Consistency Training for Low Resource Named Entity Recognition
    Wang, Rui
    Henao, Ricardo
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5303 - 5308
  • [36] Robust and Informative Text Augmentation (RITA) via Constrained Worst-Case Transformations for Low-Resource Named Entity Recognition
    Sohn, Hyunwoo
    Park, Baekkwan
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 1616 - 1624
  • [37] A Little Annotation does a Lot of Good: A Study in Bootstrapping Low-resource Named Entity Recognizers
    Chaudhary, Aditi
    Xie, Jiateng
    Sheikh, Zaid
    Neubig, Graham
    Carbonell, Jaime G.
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5164 - 5174
  • [38] PDALN: Progressive Domain Adaptation over a Pre-trained Model for Low-Resource Cross-Domain Named Entity Recognition
    Zhang, Tao
    Xia, Congying
    Yu, Philip S.
    Liu, Zhiwei
    Zhao, Shu
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5441 - 5451
  • [39] Low Resource Chinese Geological Text Named Entity Recognition Based on Prompt Learning
    He, Hang
    Ma, Chao
    Ye, Shan
    Tang, Wenqiang
    Zhou, Yuxuan
    Yu, Zhen
    Yi, Jiaxin
    Hou, Li
    Hou, Mingcai
    JOURNAL OF EARTH SCIENCE, 2024, 35 (03) : 1035 - 1043
  • [40] RoPDA: Robust Prompt -Based Data Augmentation for Low -Resource Named Entity Recognition
    Song, Sihan
    Shen, Furao
    Zhao, Jian
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19017 - 19025