A template augmented distant supervision framework for Chinese named entity recognition

被引:0
|
作者
Qi, Chengwen [1 ,2 ]
Laili, Yuanjun [1 ,2 ]
Ren, Lei [1 ,2 ]
Zhang, Lin [1 ,2 ]
Li, Bowen [3 ]
机构
[1] Beihang Univ, Sch Automat Sci & Elect Engn, Beijing 10019, Peoples R China
[2] Zhongguancun Lab, Beijing, Peoples R China
[3] Shanghai AI Lab West Bund, AI Ctr, 701 Yunjin Rd, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Chinese named entity recognition; template library; data augmentation; distant supervision;
D O I
10.1142/S1793962324500181
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Distant supervision has been proven to be an efficient way of generating labeled instances for Named Entity Recognition (NER). However, it suffers from dictionary biases and ambiguous entities, resulting in noisy and incomplete labels. To overcome this drawback, this paper proposes a template augmented distant supervision framework, which generates high-quality labeled training data with minimal human effort. Specifically, we use distant supervision to extract sentences that contain entities and apply a pre-trained language model to encode these sentences. The encoded sentences are clustered and then for each cluster, three sentences are sampled out to form a seed template pool. The seed templates are calibrated and decomposed to decouple the connection between different parts. Finally, the seed templates and entity dictionary are combined with pre-trained language model to generate semantically coherent and precisely labeled training data. Experimental results on the EC and NEWS datasets and a practical electronic after-sale Q&A dataset with multiple pre-trained language models demonstrate that the proposed framework is able to improve the F1 score of the distantly supervised NER models by 7.9%-12.9%.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] A Template-Driven Framework for Chinese Medical Named Entity Recognition
    Song, Yilin
    Kong, Fang
    Ji, Shengjie
    [J]. ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT IV, ICIC 2024, 2024, 14878 : 398 - 409
  • [2] Pattern-enhanced Named Entity Recognition with Distant Supervision
    Wang, Xuan
    Guan, Yingjun
    Zhang, Yu
    Li, Qi
    Han, Jiawei
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 818 - 827
  • [3] Named Entity Recognition for Cancer Immunology Research Using Distant Supervision
    Hai-Long Trieu
    Miwa, Makoto
    Ananiadou, Sophia
    [J]. PROCEEDINGS OF THE 21ST WORKSHOP ON BIOMEDICAL LANGUAGE PROCESSING (BIONLP 2022), 2022, : 171 - 177
  • [4] Chinese Named Entity Recognition Augmented with Lexicon Memory
    Zhou, Yi
    Zheng, Xiao-Qing
    Huang, Xuan-Jing
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2023, 38 (05) : 1021 - 1035
  • [5] Named Entity Recognition for Open Domain Data Based on Distant Supervision
    Wu, Junshuang
    Zhang, Richong
    Deng, Ting
    Huai, Jinpeng
    [J]. KNOWLEDGE GRAPH AND SEMANTIC COMPUTING: KNOWLEDGE COMPUTING AND LANGUAGE UNDERSTANDING, 2019, 1134 : 185 - 197
  • [6] Chinese Named Entity Recognition Augmented with Lexicon Memory
    Yi Zhou
    Xiao-Qing Zheng
    Xuan-Jing Huang
    [J]. Journal of Computer Science and Technology, 2023, 38 : 1021 - 1035
  • [7] Extraction of Traditional Chinese Medicine Entity: Design of a Novel Span-Level Named Entity Recognition Method With Distant Supervision
    Jia, Qi
    Zhang, Dezheng
    Xu, Haifeng
    Xie, Yonghong
    [J]. JMIR MEDICAL INFORMATICS, 2021, 9 (06)
  • [8] Adaptive Named Entity Recognition Using Distant Supervision for Contemporary Written Texts
    Kim, Juae
    Kim, Yejin
    Kang, Sangwoo
    Seo, Jungyun
    [J]. IEEE ACCESS, 2021, 9 : 80405 - 80414
  • [9] Research on the Named Entity Recognition for Rail Fault Text Based on Distant Supervision
    Cai, Yi
    Su, Shuai
    Li, Zheng
    Han, Qinglong
    Zhang, Jianxia
    [J]. 2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 939 - 944
  • [10] Bagging-Based Active Learning Model for Named Entity Recognition with Distant Supervision
    Lee, Sunghee
    Song, Yeongkil
    Choi, Maengsik
    Kim, Harksoo
    [J]. 2016 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2016, : 321 - 324