LTP: A New Active Learning Strategy for CRF-Based Named Entity Recognition

被引:14
|
作者
Liu, Mingyi [1 ]
Tu, Zhiying [1 ]
Zhang, Tong [1 ]
Su, Tonghua [1 ]
Xu, Xiaofei [1 ]
Wang, Zhongjie [1 ]
机构
[1] Harbin Inst Technol, Fac Comp, Harbin, Peoples R China
基金
美国国家科学基金会;
关键词
Active learning; Learning strategies; Named entity recognition; CRF;
D O I
10.1007/s11063-021-10737-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, deep learning has achieved great success in many natural language processing tasks, including named entity recognition. The shortcoming is that a large quantity of manually annotated data is usually required. Previous studies have demonstrated that active learning can considerably reduce the cost of data annotation, but there is still plenty of room for improvement. In real applications, we found that existing uncertainty-based active learning strategies have two shortcomings. First, these strategies prefer to choose long sequences explicitly or implicitly, which increases the annotation burden of annotators. Second, some strategies need to revise and modify the model to generate additional information for sample selection, which increases the workload of the developer and increases the training/prediction time of the model. In this paper, we first examine traditional active learning strategies in specific cases of Word2Vec-BiLSTM-CRF and Bert-CRF that have been widely used in named entity recognition on several typical datasets. Then, we propose an uncertainty-based active learning strategy called the lowest token probability (LTP), which combines the input and output of conditional random field (CRF) to select informative instances. LTP is a simple and powerful strategy that does not favor long sequences and does not need to revise the model. We test LTP on multiple real-world datasets, the experiment results show that compared with existing state-of-the-art selection strategies, LTP can reduce about 20% annotation tokens while maintaining competitive performance on both sentence-level accuracy and entity-level F1-score. Additionally, LTP significantly outperformed all other strategies in selecting valid samples, which dramatically reduced the invalid annotation times of the labelers.
引用
收藏
页码:2433 / 2454
页数:22
相关论文
共 50 条
  • [41] NAMED ENTITY RECOGNITION IN THANGKA FIELD BASED ON BERT-BiLSTM-CRF-a
    Guo, Xiaoran
    Cheng, Sujie
    Wang, Weilan
    UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2021, 83 (01): : 161 - 174
  • [42] UD_BBC: Named entity recognition in social network combined BERT-BiLSTM-CRF with active learning
    Li, Wei
    Du, Yajun
    Li, Xianyong
    Chen, Xiaoliang
    Xie, Chunzhi
    Li, Hui
    Li, Xiaolei
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 116
  • [43] Fusion of multiple features for Chinese Named Entity Recognition based on CRF model
    Zhang, Yuejie
    Xu, Zhiting
    Zhang, Tao
    INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 95 - +
  • [44] Biomedical named entity recognition based on Glove-BLSTM-CRF model
    Ning, Gelin
    Bai, Yunli
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2021, 21 (01) : 125 - 133
  • [45] A Deep Learning Based Approach for Biomedical Named Entity Recognition Using Multitasking Transfer Learning with BiLSTM, BERT and CRF
    Pooja H.
    Jagadeesh M.P.P.
    SN Computer Science, 5 (5)
  • [46] Geotechnical Named Entity Recognition Based on BERT-BiGRU-CRF Model
    Quanyu W.
    Li Z.
    Tu Z.
    Chen G.
    Hu J.
    Chen J.
    Chen J.
    Lv G.
    Diqiu Kexue - Zhongguo Dizhi Daxue Xuebao/Earth Science - Journal of China University of Geosciences, 2023, 48 (08): : 3137 - 3150
  • [47] Named Entity Recognition for Chinese Aviation Security Incident Based on BiLSTM and CRF
    Zhao, Yan
    Liu, Hu
    Chen, Zhen
    2021 2ND ASIA CONFERENCE ON COMPUTERS AND COMMUNICATIONS (ACCC 2021), 2021, : 89 - 94
  • [48] Multicore based least confidence query sampling strategy to speed up active learning approach for named entity recognition
    Agrawal, Ankit
    Tripathi, Sarsij
    Vardhan, Manu
    COMPUTING, 2023, 105 (05) : 979 - 997
  • [49] Multicore based least confidence query sampling strategy to speed up active learning approach for named entity recognition
    Ankit Agrawal
    Sarsij Tripathi
    Manu Vardhan
    Computing, 2023, 105 : 979 - 997
  • [50] A named entity recognition model based on ensemble learning
    Zhu, Xinghui
    Zou, Zhuoyang
    Qiao, Bo
    Fang, Kui
    Chen, Yiming
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2021, 21 (02) : 475 - 486