LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition

被引:14
|
作者
Liu, Mingyi [1 ]
Tu, Zhiying [1 ]
Zhang, Tong [1 ]
Su, Tonghua [1 ]
Xu, Xiaofei [1 ]
Wang, Zhongjie [1 ]
机构
[1] Harbin Inst Technol, Fac Comp, Harbin, Peoples R China
基金
美国国家科学基金会;
关键词
Active learning; Learning strategies; Named entity recognition; CRF;
D O I
10.1007/s11063-021-10737-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, deep learning has achieved great success in many natural language processing tasks, including named entity recognition. The shortcoming is that a large quantity of manually annotated data is usually required. Previous studies have demonstrated that active learning can considerably reduce the cost of data annotation, but there is still plenty of room for improvement. In real applications, we found that existing uncertainty-based active learning strategies have two shortcomings. First, these strategies prefer to choose long sequences explicitly or implicitly, which increases the annotation burden of annotators. Second, some strategies need to revise and modify the model to generate additional information for sample selection, which increases the workload of the developer and increases the training/prediction time of the model. In this paper, we first examine traditional active learning strategies in specific cases of Word2Vec-BiLSTM-CRF and Bert-CRF that have been widely used in named entity recognition on several typical datasets. Then, we propose an uncertainty-based active learning strategy called the lowest token probability (LTP), which combines the input and output of conditional random field (CRF) to select informative instances. LTP is a simple and powerful strategy that does not favor long sequences and does not need to revise the model. We test LTP on multiple real-world datasets, the experiment results show that compared with existing state-of-the-art selection strategies, LTP can reduce about 20% annotation tokens while maintaining competitive performance on both sentence-level accuracy and entity-level F1-score. Additionally, LTP significantly outperformed all other strategies in selecting valid samples, which dramatically reduced the invalid annotation times of the labelers.
引用
收藏
页码:2433 / 2454
页数:22
相关论文
共 50 条
  • [1] LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition
    Mingyi Liu
    Zhiying Tu
    Tong Zhang
    Tonghua Su
    Xiaofei Xu
    Zhongjie Wang
    Neural Processing Letters, 2022, 54 : 2433 - 2454
  • [2] CRF-based Active Learning for Chinese Named Entity Recognition
    Yao, Lin
    Sun, Chengjie
    Li, Shaofeng
    Wang, Xiaolong
    Wang, Xuan
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 1557 - +
  • [3] CRF-Based Named Entity Recognition for Myanmar Language
    Mo, Hsu Myat
    Nwet, Khin Thandar
    Soe, Khin Mar
    GENETIC AND EVOLUTIONARY COMPUTING, 2017, 536 : 204 - 211
  • [4] Combining Knowledge and CRF-Based Approach to Named Entity Recognition in Russian
    Mozharova, V. A.
    Loukachevitch, N. V.
    ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS, AIST 2016, 2017, 661 : 185 - 195
  • [5] A CRF-Based Stacking Model with Meta-features for Named Entity Recognition
    Liu, Shifeng
    Sun, Yifang
    Wang, Wei
    Zhou, Xiaoling
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II, 2018, 10938 : 54 - 66
  • [6] A CRF based Machine Learning Approach for Biomedical Named Entity Recognition
    Kanimozhi, U.
    Manjula, D.
    2017 SECOND INTERNATIONAL CONFERENCE ON RECENT TRENDS AND CHALLENGES IN COMPUTATIONAL MODELS (ICRTCCM), 2017, : 335 - 342
  • [7] CRF-Based Czech Named Entity Recognizer and Consolidation of Czech NER Research
    Konkol, Michal
    Konopik, Miloslav
    TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 153 - 160
  • [8] Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content
    Seker, Gokhan Akin
    Eryigit, Gulsen
    SEMANTIC WEB, 2017, 8 (05) : 625 - 642
  • [9] The Named Entity Recognition of Chinese Cybersecurity Using an Active Learning Strategy
    Xie, Bo
    Shen, Guowei
    Guo, Chun
    Cui, Yunhe
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
  • [10] Loss-based Active Learning for Named Entity Recognition
    Linh, Le Thai
    Nguyen, Minh-Tien
    Zuccon, Guido
    Demartini, Gianluca
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,