Semi-Supervised Sequence Labeling with Self-Learned Features

被引:6
|
作者
Qi, Yanjun [1 ]
Kuksa, Pavel [2 ]
Collobert, Ronan [1 ]
Sadamasa, Kunihiko [1 ]
Kavukcuoglu, Koray [3 ]
Weston, Jason [4 ]
机构
[1] NEC Labs Amer Inc, Machine Learning Dept, Princeton, NJ 08540 USA
[2] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ 08901 USA
[3] NYU, Dept Comp Sci, New York, NY 10003 USA
[4] Google Res NY, New York, NY 10027 USA
关键词
semi-supervised learning; semi-supervised feature learning; information extraction; structural output learning; sequence labeling; self-learned features; RECOGNITION;
D O I
10.1109/ICDM.2009.40
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Typical information extraction (LE) systems can be seen as tasks assigning labels to words in a natural language sequence. The performance is restricted by the availability of labeled words. To tackle this issue, we propose a semi-supervised approach to improve the sequence labeling procedure in IE through a class of algorithms with self-learned features (SLF). A supervised classifier can be trained with annotated text sequences and used to classify each word in a large set of unannotated sentences. By averaging predicted labels over all cases in the unlabeled corpus, SLF training builds class label distribution patterns for each word (or word attribute) in the dictionary and re-trains the current model iteratively adding these distributions as extra word features. Basic SLF models how likely a word could be assigned to target class types. Several extensions are proposed, such as learning words' class boundary distributions. SLF exhibits robust and scalable behaviour and is easy to tune. We applied this approach on four classical IE tasks: named entity recognition (German and English), part-of-speech tagging (English) and one gene name recognition corpus. Experimental results show effective improvements over the supervised baselines on all tasks. In addition, when compared with the closely related self-training idea, this approach shows favorable advantages.
引用
收藏
页码:428 / +
页数:3
相关论文
共 50 条
  • [1] Semi-supervised Multitask Learning for Sequence Labeling
    Rei, Marek
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 2121 - 2130
  • [2] Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling
    Jiao, Feng
    Wang, Shaojun
    Lee, Chi-Hoon
    Greiner, Russell
    Schuurmans, Dale
    [J]. COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 209 - 216
  • [3] Semi-supervised learning for sequence labeling using conditional random fields
    Wong, TL
    Lam, W
    [J]. PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 2832 - 2837
  • [4] Semi-supervised Sequence Learning
    Dai, Andrew M.
    Le, Quoc V.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [5] Semi-supervised Mesh Segmentation and Labeling
    Lv, Jiajun
    Chen, Xinlei
    Huang, Jin
    Bao, Hujun
    [J]. COMPUTER GRAPHICS FORUM, 2012, 31 (07) : 2241 - 2248
  • [6] Self-Supervised Wasserstein Pseudo-Labeling for Semi-Supervised Image Classification
    Taherkhani, Fariborz
    Dabouei, Ali
    Soleymani, Sobhan
    Dawson, Jeremy
    Nasrabadi, Nasser M.
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12262 - 12272
  • [7] Self-Supervised Sequence Recovery for Semi-Supervised Retinal Layer Segmentation
    Yang, Jiadong
    Tao, Yuhui
    Xu, Qiuzhuo
    Zhang, Yuhan
    Ma, Xiao
    Yuan, Songtao
    Chen, Qiang
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (08) : 3872 - 3883
  • [8] K-similar Conditional Random Fields for Semi-supervised Sequence Labeling
    Chen, Xi
    Chen, Shihong
    Xiao, Kun
    [J]. ALPIT 2008: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, PROCEEDINGS, 2008, : 21 - 26
  • [9] TRACKING INTERMITTENT PARTICLES WITH SELF-LEARNED VISUAL FEATURES
    Reme, Raphael
    Piriou, Victor
    Hanson, Alison
    Yuste, Rafael
    Newson, Alasdair
    Angelini, Elsa
    Olivo-Marin, Jean-Christophe
    Lagache, Thibault
    [J]. 2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [10] Semi-supervised labeling: a proposed methodology for labeling the twitter datasets
    Jan, Tabassum Gull
    Khurana, Surinder Singh
    Kumar, Munish
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (06) : 7669 - 7683