An adaptive approach to noisy annotations in scientific information extraction

被引:0
|
作者
Bolucu, Necva [1 ]
Rybinski, Maciej [1 ]
Dai, Xiang [1 ]
Wan, Stephen [1 ]
机构
[1] CSIRO, Data61, Sydney, NSW 2122, Australia
关键词
Information extraction; Dataset; Mislabelled; Noisy; Weighted weakly supervised learning; Scientific;
D O I
10.1016/j.ipm.2024.103857
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Despite recent advances in large language models (LLMs), the best effectiveness in information extraction (IE) is still achieved by fine-tuned models, hence the need for manually annotated datasets to train them. However, collecting human annotations for IE, especially for scientific IE, where expert annotators are often required, is expensive and time-consuming. Another issue widely discussed in the IE community is noisy annotations. Mislabelled training samples can hamper the effectiveness of trained models. In this paper, we propose a solution to alleviate problems originating from the high cost and difficulty of the annotation process. Our method distinguishes clean training samples from noisy samples and then employs weighted weakly supervised learning (WWSL) to leverage noisy annotations. Evaluation of Named Entity Recognition (NER) and Relation Classification (RC) tasks in Scientific IE demonstrates the substantial impact of detecting clean samples. Experimental results highlight that our method, utilising clean and noisy samples with WWSL, outperforms the baseline RoBERTa on NER (+4.28, + 4.59, + 29.27, and + 5.21 gain for the ADE, SciERC, STEM-ECR, and WLPC datasets, respectively) and the RC (+6.09 and + 4.39 gain for the SciERC and WLPC datasets, respectively) tasks. Comprehensive analyses of our method reveal its advantages over state-ofthe-art denoising baseline models in scientific NER. Moreover, the framework is general enough to be adapted to different NLP tasks or domains, which means it could be useful in the broader NLP community.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Towards Robust Adaptive Object Detection under Noisy Annotations
    Liu, Xinyu
    Li, Wuyang
    Yang, Qiushi
    Li, Baopu
    Yuan, Yixuan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14187 - 14196
  • [2] Adaptive Early-Learning Correction for Segmentation from Noisy Annotations
    Liu, Sheng
    Liu, Kangning
    Zhu, Weicheng
    Shen, Yiqiu
    Fernandez-Granda, Carlos
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2596 - 2606
  • [3] Shape information extraction in noisy environments
    Pasian, Fabio
    Santin, Paolo
    [J]. PATTERN RECOGNITION LETTERS, 1983, 2 (02) : 109 - 116
  • [4] Adaptive information extraction
    Turmo, Jordi
    Ageno, Alicia
    Catala, Neus
    [J]. ACM COMPUTING SURVEYS, 2006, 38 (02)
  • [5] Dynamic adaptive threshold based learning for noisy annotations robust facial expression recognition
    Darshan Gera
    Bobbili Veerendra Raj Kumar
    Naveen Siva Kumar Badveeti
    S Balasubramanian
    [J]. Multimedia Tools and Applications, 2024, 83 : 49537 - 49566
  • [6] Dynamic adaptive threshold based learning for noisy annotations robust facial expression recognition
    Gera, Darshan
    Kumar, Bobbili Veerendra Raj
    Badveeti, Naveen Siva Kumar
    Balasubramanian, S.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (16) : 49537 - 49566
  • [7] INFORMATION THEORETICAL APPROACH TO NOISY DYNAMICS
    MATSUMOTO, K
    TSUDA, I
    [J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1985, 18 (18): : 3561 - 3566
  • [8] Active Learning with Adaptive Density Weighted Sampling for Information Extraction from Scientific Papers
    Suvorov, Roman
    Shelmanov, Artem
    Smirnov, Ivan
    [J]. ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 77 - 90
  • [9] Exploiting Information Extraction Annotations for Document Retrieval in Distillation Tasks
    Hakkani-Tuer, Dilek
    Tur, Gokhan
    Levit, Michael
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2660 - +
  • [10] Extraction of time varying information from noisy signals: An approach based on the empirical mode decomposition
    Li, Chen
    Wang, Xinlong
    Tao, Zhiyong
    Wang, Qingfu
    Du, Shuanping
    [J]. MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2011, 25 (03) : 812 - 820