Investigating annotation noise for named entity recognition

被引:3
|
作者
Zhu, Yu [1 ]
Ye, Yingchun [1 ]
Li, Mengyang [1 ,2 ]
Zhang, Ji [3 ]
Wu, Ou [1 ]
机构
[1] Tianjin Univ, Natl Ctr Appl Math, Weijin Rd, Tianjin 300072, Peoples R China
[2] Jiuantianxia Inc, Jinguan North Second St, Beijing 100102, Peoples R China
[3] Zhejiang Lab, Wenyi West Rd, Hangzhou 311100, Zhejiang, Peoples R China
来源
NEURAL COMPUTING & APPLICATIONS | 2023年 / 35卷 / 01期
关键词
Information extraction; Named entity recognition; Noisy labels; Bayesian neural network;
D O I
10.1007/s00521-022-07733-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies revealed that even the most widely used benchmark dataset still contains more than 5% sample-level annotation noise in Named Entity Recognition (NER). Hence, we investigate annotation noise in terms of noise detection and noise-robust learning. First, considering that noisy labels usually occur when few or vague annotation cues appear in annotated texts and their contexts, an annotation noise detection model is constructed based on self-context contrastive loss. Second, an improved Bayesian neural network (BNN) is presented by adding a learnable systematic deviation term into the label generation processing of classical BNN. In addition, two learning strategies of systematic deviation items based on the output of the noise detection model are proposed. Experimental results of our proposed noise detection model show an improvement of up to 7.44% F1 on CoNLL03 than the existing method. Extensive experiments on two widely used but noisy benchmarks for NER, CoNLL03 and WNUT17 demonstrate that our proposed systematic deviation BNN has the potential to capture systematic annotation mistakes, and it can be extended to other areas with annotation noise.
引用
收藏
页码:993 / 1007
页数:15
相关论文
共 50 条
  • [21] CLEANCONLL: A Nearly Noise-Free Named Entity Recognition Dataset
    Ruecker, Susanna
    Akbik, Alan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 8628 - 8645
  • [22] Named Entity Recognition for Vietnamese
    Dat Ba Nguyen
    Son Huu Hoang
    Son Bao Pham
    Thai Phuong Nguyen
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, PROCEEDINGS, 2010, 5991 : 205 - 214
  • [23] Persian Named Entity Recognition
    Dashtipour, Kia
    Gogate, Mandar
    Adeel, Ahsan
    Algarafi, Abdulrahman
    Howard, Newton
    Hussain, Amir
    2017 IEEE 16TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2017, : 79 - 83
  • [24] Named Entity Recognition for Tweets
    Liu, Xiaohua
    Wei, Furu
    Zhang, Shaodian
    Zhou, Ming
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2013, 4 (01)
  • [25] NAMED ENTITY RECOGNITION FOR POLISH
    Marcinczuk, Michal
    Wawer, Aleksander
    POZNAN STUDIES IN CONTEMPORARY LINGUISTICS, 2019, 55 (02): : 239 - 269
  • [26] NAMED ENTITY RECOGNITION FOR ROMANIAN
    Iftene, Adrian
    Trandabat, Diana
    Toader, Mihai
    Corici, Marius
    KEPT 2011: KNOWLEDGE ENGINEERING PRINCIPLES AND TECHNIQUES, 2011, : 49 - 60
  • [27] An Overview of Named Entity Recognition
    Sun, Peng
    Yang, Xuezhen
    Zhao, Xiaobing
    Wang, Zhijuan
    2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 273 - 278
  • [28] Named Entity Recognition Approaches
    Mansouri, Alireza
    Affendey, Lilly Suriani
    Mamat, Ali
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2008, 8 (02): : 339 - 344
  • [29] Arabic Named Entity Recognition
    Benajiba, Yassine
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2010, (44): : 151 - 152
  • [30] Dynamic Named Entity Recognition
    Luiggi, Tristan
    Soulier, Laure
    Guigue, Vincent
    Jendoubi, Siwar
    Baelde, Aurelien
    38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023, 2023, : 890 - 897