An adaptive approach to noisy annotations in scientific information extraction

被引:0
|
作者
Bolucu, Necva [1 ]
Rybinski, Maciej [1 ]
Dai, Xiang [1 ]
Wan, Stephen [1 ]
机构
[1] CSIRO, Data61, Sydney, NSW 2122, Australia
关键词
Information extraction; Dataset; Mislabelled; Noisy; Weighted weakly supervised learning; Scientific;
D O I
10.1016/j.ipm.2024.103857
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Despite recent advances in large language models (LLMs), the best effectiveness in information extraction (IE) is still achieved by fine-tuned models, hence the need for manually annotated datasets to train them. However, collecting human annotations for IE, especially for scientific IE, where expert annotators are often required, is expensive and time-consuming. Another issue widely discussed in the IE community is noisy annotations. Mislabelled training samples can hamper the effectiveness of trained models. In this paper, we propose a solution to alleviate problems originating from the high cost and difficulty of the annotation process. Our method distinguishes clean training samples from noisy samples and then employs weighted weakly supervised learning (WWSL) to leverage noisy annotations. Evaluation of Named Entity Recognition (NER) and Relation Classification (RC) tasks in Scientific IE demonstrates the substantial impact of detecting clean samples. Experimental results highlight that our method, utilising clean and noisy samples with WWSL, outperforms the baseline RoBERTa on NER (+4.28, + 4.59, + 29.27, and + 5.21 gain for the ADE, SciERC, STEM-ECR, and WLPC datasets, respectively) and the RC (+6.09 and + 4.39 gain for the SciERC and WLPC datasets, respectively) tasks. Comprehensive analyses of our method reveal its advantages over state-ofthe-art denoising baseline models in scientific NER. Moreover, the framework is general enough to be adapted to different NLP tasks or domains, which means it could be useful in the broader NLP community.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Method for extraction of contours for identifying characteristic information in noisy images
    Niedziela, T
    Stankiewicz, A
    Jaroszewicz, LR
    Merta, I
    [J]. OPTICA APPLICATA, 1998, 28 (02) : 81 - 93
  • [22] Information Extraction from a Strategic Sender over a Noisy Channel
    Vora, Anuj S.
    Kulkarni, Ankur A.
    [J]. 2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 354 - 359
  • [23] Patient Information Extraction in Noisy Tele-health Texts
    Kim, Mi-Young
    Xu, Ying
    Zaiane, Osmar
    Goebel, Randy
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2013,
  • [24] Information Extraction and Noisy Feature Pruning for Mandarin Speech Recognition
    Gao, Guozhi
    Duan, Zhikui
    Yang, Guangguang
    Li, Shiren
    Yu, Xinmei
    Zhao, Xiaomeng
    Ruan, Jinbiao
    [J]. JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2024, 72 (1-2): : 59 - 70
  • [25] Clearing Noisy Annotations for Computed Tomography Imaging
    Khudorozhkov, Roman
    Koriagin, Aleksandr
    Kozhevin, Alexey
    [J]. 2018 FIFTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2018, : 167 - 171
  • [26] GROBID - Information Extraction from Scientific Publications
    Lopez, Patrice
    Romary, Laurent
    [J]. ERCIM NEWS, 2015, (100): : 41 - 42
  • [27] Information transmission by adaptive synchronization with chaotic carrier and noisy channel
    Andrievsky, B
    Fradkov, A
    [J]. PROCEEDINGS OF THE 39TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 2000, : 1025 - 1030
  • [28] Robust Point Cloud Segmentation With Noisy Annotations
    Ye, Shuquan
    Chen, Dongdong
    Han, Songfang
    Liao, Jing
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 7696 - 7710
  • [29] Automatic and Adaptive Clusters for Information Extraction
    Charulatha, B. S.
    Rodrigues, Paul
    Chitralekha, T.
    [J]. 2014 INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE ISCMI 2014, 2014, : 60 - 63
  • [30] Adaptive Information Extraction of Disaster Information from Twitter
    Rcgalado, Ralph Vincent J.
    Chua, Jenina L.
    Co, Justin L.
    Cheng, Herman C.
    Magpantay, Angelo Bruce L.
    Kalaw, Kristine Ma. Dominique F.
    [J]. 2014 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2014, : 286 - 289