CRMSP: A semi-supervised approach for key information extraction with Class-Rebalancing and Merged Semantic Pseudo-Labeling

被引:0
|
作者
Zhang, Qi [1 ]
Song, Yonghong [1 ]
Guo, Pengcheng [1 ]
Hui, Yangyang [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, 28 Xianning West Rd, Xian 710049, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Semi-supervised learning; Key information extraction; Long-tailed distribution; Semantic Pseudo-Labeling;
D O I
10.1016/j.neucom.2024.128907
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There is a growing demand in the field of Key Information Extraction (KIE) to apply semi-supervised learning (SSL) to save manpower and costs, as training document data using fully-supervised methods requires laborintensive manual annotation. The main challenges of applying SSL in the KIE are (1) underestimation of the confidence of tail classes in the long-tailed distribution and (2) difficulty in achieving intra-class compactness and inter-class separability of tail features. To address these challenges, we propose a novel semi-supervised approach for KIE with Class-Rebalancing and Merged Semantic Pseudo-Labeling (CRMSP). Firstly, the Class-Rebalancing Pseudo-Labeling (CRP) module introduces a reweighting factor to rebalance pseudo-labels, increasing attention to tail classes. Secondly, we propose the Merged Semantic Pseudo-Labeling (MSP) module to cluster tail features of unlabeled data by assigning samples to Merged Prototypes (MP). Additionally, we designed anew contrastive loss specifically for MSP. Extensive experimental results on three well-known benchmarks demonstrate that CRMSP achieves state-of-the-art performance. Remarkably, CRMSP achieves 3.24% f1-score improvement over state-of-the-art on the CORD.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] PLBR: A Semi-Supervised Document Key Information Extraction via Pseudo-Labeling Bias Rectification
    Guo, Pengcheng
    Song, Yonghong
    Wang, Boyu
    Liu, Jiaohao
    Zhang, Qi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (12) : 9025 - 9036
  • [2] A Pseudo-labeling Approach to Semi-supervised Organ Segmentation
    Gao, Jianwei
    Xu, Juan
    Fei, Honggao
    FAST AND LOW-RESOURCE SEMI-SUPERVISED ABDOMINAL ORGAN SEGMENTATION, FLARE 2022, 2022, 13816 : 318 - 326
  • [3] Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning
    Cascante-Bonilla, Paola
    Tan, Fuwen
    Qi, Yanjun
    Ordonez, Vicente
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 6912 - 6920
  • [4] Semi-Supervised Multimodal Emotion Recognition with Class-Balanced Pseudo-Labeling
    Chen, Haifeng
    Guo, Chujia
    Li, Yan
    Zhang, Peng
    Jiang, Dongmei
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9556 - 9560
  • [5] Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition
    Higuchi, Yosuke
    Moritz, Niko
    Le Roux, Jonathan
    Hori, Takaaki
    INTERSPEECH 2021, 2021, : 726 - 730
  • [6] Spatial pseudo-labeling for semi-supervised facies classification
    Asghar, Saleem
    Choi, Junhwan
    Yoon, Daeung
    Byun, Joongmoo
    JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2020, 195
  • [7] Semi-supervised Object Detection with Adaptive Class-Rebalancing Self-Training
    Zhang, Fangyuan
    Pan, Tianxiang
    Wang, Bin
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3252 - 3261
  • [8] GENERALIZED PSEUDO-LABELING IN CONSISTENCY REGULARIZATION FOR SEMI-SUPERVISED LEARNING
    Karaliolios, Nikolaos
    Chabot, Florian
    Dupont, Camille
    Le Borgne, Herve
    Quoc-Cuong Pham
    Audigier, Romaric
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 525 - 529
  • [9] Multiview Pseudo-Labeling for Semi-supervised Learning from Video
    Xiong, Bo
    Fan, Haoqi
    Grauman, Kristen
    Feichtenhofer, Christoph
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7189 - 7199
  • [10] Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition
    Zhu H.
    Gao D.
    Cheng G.
    Povey D.
    Zhang P.
    Yan Y.
    IEEE/ACM Transactions on Audio Speech and Language Processing, 2023, 31 : 3320 - 3330